#### premio tesi di dottorato

– 44 –

# PREMIO TESI DI DOTTORATO

# Commissione giudicatrice, anno 2013

Luigi Lotti, *presidente della Commissione*

Tito Arecchi, *Area Scientifica* Franco Cambi, *Area Umanistica* Paolo Felli, *Area Tecnologica* Michele Arcangelo Feo, *Area Umanistica* Roberto Genesio, *Area Tecnologica* Luigi Lotti, *Area Scienze Sociali* Mario Pio Marzocchi, *Area Scientifica* Adolfo Pazzagli, *Area Biomedica* Mario Giuseppe Rossi, *Area Umanistica* Salvatore Ruggieri, *Area Biomedica* Saulo Sirigatti, *Area Biomedica* Piero Tani, *Area Scienze Sociali* Fiorenzo Cesare Ugolini, *Area Tecnologica* Vincenzo Varano, *Area Scienze Sociali* Graziella Vescovini, *Area Umanistica*

# **Coarse-grained molecular dynamics and continuum models for the transport of protein molecules**

Firenze University Press 2014

Coarse-grained molecular dynamics and continuum models for the transport of protein molecules / Marco Bacci. – Firenze : Firenze University Press, 2014. (Premio Tesi di Dottorato; 44)

http://digital.casalini.it/9788866556947

ISBN 978-88-6655-693-0 (print) ISBN 978-88-6655-694-7 (online)

#### *Peer Review Process*

All publications are submitted to an external refereeing process under the responsibility of the FUP Editorial Board and the Scientific Committees of the individual series. The works published in the FUP catalogue are evaluated and approved by the Editorial Board of the publishing house. For a more detailed description of the refereeing process we refer to the official documents published in the online catalogue of the FUP (www.fupress.com).

#### *Firenze University Press Editorial Board*

G. Nigro (Co-ordinator), M.T. Bartoli, M. Boddi, R. Casalbuoni, C. Ciappei, R. Del Punta, A. Dolfi, V. Fargion, S. Ferrone, M. Garzaniti, P. Guarnieri, A. Mariani, M. Marini, A. Novelli, M. Verga, A. Zorzi.

This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0: http://creativecommons.org/licenses/by/4.0/)

**CC** 2014 Firenze University Press Università degli Studi di Firenze Firenze University Press Borgo Albizi, 28, 50122 Firenze, Italy www.fupress.com *Printed in Italy*

Ai miei zii, zie, cugini e cugine

René Descartes

"Do not trust any existing theory. Start to think independently from as fewer intuitive concepts as possible to form your own conclusion."

"Do not trust any existing theory. Start to think independently from as fewer intuitive concepts as possible to form your own conclusion." René Descartes

Ai miei zii, zie, cugini e cugine

# **Table of contents**

**Contents**


Coarse-grained molecular dynamics and continuum models


# **Abstract**

In the present thesis, coarse-grained molecular dynamics simulations and relevant continuum models are investigated. In the first part we present a computational analysis on the driven transport of Maltose Binding Protein (MBP) across nano-channels in the framework of coarse-grained modeling. The work is motivated by recent experiments on voltage driven transport of MBP across nanopores exploring the influence of denaturation on translocation pathways. Our simplified approach allows a statistical mechanics analysis of the transport phenomenology which is useful to investigate the influence of denaturation from a structural point of view. Numerical results are qualitatively in agreement with experimental data. They reproduce a rich phenomenology, from fast transport to long blockades (Fig. ) and, to some extent, bumping events. Specifically, we identify and characterize short and long channel blockades, associated with the translocation of denaturated and folded MBP structures, respectively. We show that long blockades are related to stall events where MBP undergoes specific and reproducible structural rearrangements. To clarify the origin of the stalls, the

Figure : Stall snapshots during traslocation of folded MBP, stastistics of residence time and native contact map.

stick-and-slip translocation is compared with mechanical unfolding pathways obtained via steered molecular dynamics. This comparison clearly shows that translocation pathway differs significantly from free-space unfolding dynamics and strongly suggests that stalling events are preferentially determined by MBP regions with higher density of long-range native interactions. This result might constitute a possible criterion to predict a-priori statistical features of protein translocation from structural analysis.

Customarily, protein dynamics is summarized in low-dimensional models where only the evolution of few order parameters is considered as an appropriate description of the complete dynamics of the atomic cluster constituting the molecule. We show that long stall events and main translocation features are essentially maintained in the D description of the process selected in the present work, where the evolution of an appropriate collective variable and the relevant free energy landscape are considered. Moreover, we show that both free energy barriers and translocation bottlenecks are strongly associated with the structural properties of the folded proteins. In particular, free-energy ramps systematically correspond to the regions along the backbone chain with greater density of long-range native contacts with the untranslocated portion of the protein itself. These areas are also responsible for the stalls during nanopore transport. Also, this knowledge allow us to decompose the free energy trends into the different contributions of internal energy and entropic effect, pointing out the dominant terms. Here, results relevant to two different proteins and to a mutant one are considered and discussed.

In the second part of the thesis, we enrich the standard D continuum view by proposing a continuum model for the description of the dynamics of isolated macromolecules. We adopt a second-rank tensor as a descriptor of the macromolecular shape and identify the action governing its dynamics by means of an identification procedure from a discrete scheme, based on power equivalence and Cauchy-Born rule. We compare molecular dynamics stretching simulations with the continuum model by starting from discrete toy schemes, going on with increasing complexity, and ending with the analysis of the Ubiquitin protein. The results indicate limitations in the approach in case of unconstrained molecular dynamics while they show appropriateness for driven dynamics (Fig. ). Also MBP is considered by addressing primarly translocation simulations, a particular constrained dynamics where the selected approach confirms its appropriate-

Figure : Snapshots of molecule Ubiquitin and relevant morphological descriptor during a mechanical pulling simulation.

ness.

Finally, to overcome the shortcomings stemming from possible lack of homogeneous deformation Cauchy-Born rule is based on, we investigate some alternative heuristic approaches, which appear adequate in describing the protein shape evolution also in uncostrained dynamics, such as free equilibrations.

# **Acknowledgements**

First and foremost, I wish to acknowledge sincerely my advisor, Prof. Paolo Maria Mariano. Motivations to my gratefulness are several, and start earlier than the research activity here presented, when just after the master degree, I asked him for a piece of advice to pursue the deep interest I have always felt for biomechanics. I can state, the PhD. days have been tough for me, nevertheless for him, supporting my work and intellectual growth from the very beginning and providing me with many opportunities, even beyond the mere scientific context. He has never lacked of attention and real passion toward our work and the academic world. In these last days his detailed revision to this manuscript has been crucial and tireless.

I also wish to express my gratitude to Prof. Roberto Genesio and Prof. Stefano Ruffo, for the support during the long journeys abroad and for the interest they have always shown in the research activity I have been involved in.

This research project would not have been possible without the support of many people. Profound thanks are directed to the research group in Rome, Dr. Fabio Cecconi, Prof. Carlo Massimo Casciola and Dr. Mauro Chinappi. I have performed a considerable part of the project in collaboration with them, who in turn have always considered me as a full member of the group, carrying out stimulating and challenging discussions, which can not be neglected neither from a scientific nor from a personal point of view. I will remember the experience in Rome as a wonderful period, where I started to focus on the research proposal in a stimulating and friendly environment, the so-called 'aquarium'. Really, a special thank, again, to Mauro Chinappi, he taught me things I should have learnt before, with unfailing patience and competence.

I am grateful to Dr. Sandrasekaram Gnanakaran, Dr. Giovanni Bellesia and to the whole T in Los Alamos National Laboratory. Despite the short visit, I have been involved in meetings and discussions that considerably fostered the second part of the research project about continuum modeling. Also, feeling at home in such a far away country is not for granted and the friendly environment of the Lab, despite what one might wrongly believe, played a substantial rôle.

I would like to gratefully acknowledge Prof. Charles S. Peskin and the Courant Institute in New York University. By far the toughest days during the PhD., as it has to be in such a mathematical school. Thanks to Prof. Peskin I had the opportunity to present our scientific results to a selected group of scientists, who have not been short of comments and hints for future research (finger crossed). However, above all, I have had the possibility to interact with a person who undertakes his job driven by passion, thus inspiring an otherwise impossible serenity and simplicity to all the ones who have the chance (and luck) to work with him. A big 'thank you' also to Giulio Trigila, Abba and Shoshana Leffler, Adam Stinchcombe and friends, colleagues and staff at NYU.

Mentioning the New York City days, I can not forget my girlfriend Jocelyn. That is another story, I know, and deserves another stage. Let me just say I am impressed and thankful to her determination and strength. / working schedule for both of us, and not a complaint. I received only sympathy, sharing, admiration and love. Many many thanks to Donna and Alex, Angie and Willie and, of course, Reyes and their families, who have welcome me as if I had always been there.

Thank you zio Gigi, always so eager to interact and work with me, another old-fashioned person that goes on for true passion.

My family, I have been far apart for some time, but it has always been like having you close by. I have always felt a deep, sincere and excited interest for what I have been on. Thank you, knowing your estimation for my commitment allowed me not to give up.

Moreover, I wish to thank my friends, a bearing column where to rest, take a break and dream.

Finally, I thank the personnel of University of Florence, and the Ministry of Public Education for the scholarship I have benefitted from.

# **Chapter Introduction**

#### **.Preamble**

The present dissertation is supervised by Prof. Paolo Maria Mariano of University of Florence and Scuola Normale Superiore of Pisa.

It is developed in collaboration with Prof. Carlo Massimo Casciola and Dr. Mauro Chinappi of Sapienza University of Rome and Dr. Fabio Cecconi, Institute of Complex Systems, National Research Council of Rome.

The focus of the work is on the transport of proteins across nanopores, a process called translocation. The work is based on coarse-grained molecular dynamics simulations exploiting the so-called Go-like model, [ ¯ , ]. Equilibration, mechanical stretching and transport simulations are developed.

A standard D continuum approach based on the Langevin equation in the potential of the mean force relevant to a suitably-defined reaction coordinate is investigated. The main aim is to establish its suitability to retain and interpret the complex three-dimensional molecular dynamics phenomenology.

Non-conventional continuum modeling is also considered. In this perspective a second rank tensor is defined in order to summarize the three-dimensional shape evolution of clusters of atoms in space, in turn undergoing several different dynamics. The analysis is performed in light of the mechanics of complex materials [].

#### **.Translocation in biology**

Translocation, namely the passage of molecules through biological pores, is a fundamental process occurring in many natural phenomena, []. It involves the exchange of proteins, ions, energy sources, genetic information and any particle or aggregate that plays a role in cell functioning and living. In particular, biopolymer translocation across membranes is ubiquitous in biology and includes the passage of RNA strands inside nuclear pores [], absorption of oligonucleotides on suitable sites [], transport of proteins back and forth from the cells [], mitochondrial import, protein degradation by ATP-dependent proteases and protein synthesis, [, ].

Cells are filled with all kinds of nanopores, committed to an overwhelming range of functions. Proteins hold an important role in transport processes across cell membranes. Indeed the latter are generally composed of phospholipid bilayers and the channels across them by transmembrane proteins. The transport can be either passive or active. The former one can take place by diffusion or by interposition of carriers to which the transported substances bonded. In the active case instead, transmembrane proteins use the Adenosine-Tri-Phosphate chemical energy to drive a molecule against its concentration gradient. A mechanism of selection, based on the interaction between signal recognition particles and relative receptors, allows the transmembrane proteins to detect the system to be transported [].

As it will be described in Sec. , the discovery of the protein nature of transmembrane pores fostered the synthesis of conductive channels to perform in vitro translocation experiments. Although the first pioneering experimental work on a transmembrane biopore embedded in an artificial lipid bilayer, both housed in a microfluidic cell, dates back to the 's [], clarifying the physical and chemical phenomena involved in nanopore transport will remain a fertile field of research for the years to come, as it is only recently that scientists have developed suitable techniques to adequately tackle this issue.

Proteins are not only the basic constituent of transmembrane channels, but are also commonly transported between organelles and cells. Very often membrane apertures are not large enough to let the proteins enter in their native (folded) conformations. For example, the narrowest constriction in the proteosome organelle is only Å in diameter. It is thus necessary for the proteins to unfold in advance or during their passage across the pore []. Such co-translocation unfolding rate is orders of magnitude faster than chemical or thermal denaturation, suggesting a different unfolding mechanism [, ]. A natural and appealing explanation of this gap is to suppose the process to be driven by electromechanical forces.

# **..Protein molecules**

Proteins are complex polymers composed of smaller elementary molecules called amino acids. The specific monomer sequence determines their threedimensional structure, which is in turn responsible for the biological function of the protein itself. Proteins influence a large number of phenomena in life science. For instance, some of them are involved in the transmission of signals between cells and tissues (hormones), other ones support the immune system against pathogens (antibodies); moreover enzymes cause or foster the onset of chemical reactions otherwise impossible at physiological conditions. In addition, proteins are responsible for the regulation of gene expression by bonding with specific nucleic acid targets. Structural proteins are fundamental in converting chemical energy into mechanical work in muscles, besides constituting

part of the scaffold of tissues such as hair, nails, tendons and bones. Finally, particular proteins store and transport particles, control the passage of molecules across membranes and drive the electron current in plant photosynthesis. Proteins can be subdivided in two main categories:


The former are the majority, are soluble and their three-dimensional structure is almost spherical. The latter are insoluble and do not have a predetermined shape, although often are rod-like.

The conformation that allows a protein to comply with its biological task is said native configuration, or also folded, natured or compact state. There are several techniques to achieve protein denaturation. Unfolded states can be obtained, for example, either thermally, chemically or even mechanically, as in atomic force microscope experiments or by using optical tweezers.

In nature there exist different standard amino acids that constitute, as already mentioned, the backbone chain of the protein molecules. Protein size ranges from up to more than amino acids. However, the latter can bond one another forming also small peptides or polypeptides with no specific threedimensional shape. Nineteen amino acids have the same general structure (depicted in Fig. .) and differ only in the chemical composition of the side chain **R** (residue chain). The side chain of the th (prolyne) is standard, but it is bonded to a nitrogen atom instead than to a carbon atom. The latter is called α-carbon atom. Other carbon atoms along the side chain are termed β, γ, δ, ... accordingly to their position with respect to the first one. Very often amino acids are simply called residues in the biological parlance.

Figure .: General representation of amino acids. Main components are highlighted and labeled.

#### **.Voltage-driven experiments and nanopore technology**

The basic apparatus to perform voltage-driven translocation experiments is constituted by an electro-chemical micro-fluidic cell, as the one depicted in

Fig. . panel a, which represents the evolution of the Coulter resistive counter []. In such a device two chambers (cis and trans), which contain a buffer so-

Figure .: Snapshots of panels d and c are obtained using VMD software []. Panel a: Sketch of a microfluidic experimental apparatus for voltage-driven translocation experiments. A magnification of the sensing area of the device (nanopore) is also provided. Panel b: Standard ion current blockade event at the passage of a molecule across the channel. Panel c: Side view of the α-hemolysin transmembrane protein. Panel d: Characteristic eptametric structure of the α-hemolysin. This toxin is in fact composed of seven elementary proteins arranged in a circular fashion.

lution, are connected by a pipe that also hosts the sensing area, i.e. a membrane where a nanopore is inserted (single nanopore systems). An applied voltage, usually of the order of some mV , generates an ionic current through the channel that can be recorded by standard electro-physiological techniques like patchclamp []. Once biopolymers (with a net charge at the pH of the solution) are introduced (on the cis-side), they cross the pore driven by thermal fluctuations (in the approaching to the pore) and by the voltage (across the pore itself, where the majority of the voltage drop takes place) and flow into the opposite chamber (trans-side). The passage temporary clogs the channel and provokes a detectable ion current drop, Fig. . panel b, which strongly depends on the chemical and physical properties of the molecule that occupies the pore. For this reason, nanopore systems can work as efficient devices to operate at the molecular level in order to characterize several different biological components []. Nanopore-based technology is believed a promising resource offering powerful tools for detection [], manipulation and sequencing of macromolecules [],

although meaningful data analysis still remains challenging []. Moreover, the duration of the current drop is a direct measure of the translocation time, which is in turn used to characterize a single transport event, along with the intensity of the blockage itself. Clusters of events, usually coupled with changes in external conditions such as temperature or salt concentration, can as well provide information about the composition of the solution.

In voltage-driven translocation experiments, a widely exploited transmembrane protein is the well-known α-hemolysin (αHL), a β-barrel toxin expressed by the the very common bacterium Staphylococcus Aureus []. αHL is very stable in-vitro and spontaneously migrates into lipid bilayers, considerably simplifying the building up of the experimental apparatus. Therefore it has been employed in several cases both for DNA/RNA strands and for protein translocation [, , –]. Its crystallographic structure was determined with a resolution of . Å [] and a schematic view is given in panels c and d of Fig. ..

Another biological nanopore that worth to be mentioned for similar applications is the Aerolysin channel [], recently used by Pastoriza-Gallego et al. to study the translocation of a macromolecule called Maltose Binding protein [, ].

Other pioneering experimental studies concerned detecting polypeptides not only in the pore lumen but also in the aqueous phase. This analysis has been achieved for the first time by Movileanu and co-workers [, ], who were able to reveal the presence of a particular protein in the proximity of the biopore by attaching a flexible linker within the large cavity of the α-hemolysin pore with a specific ligand at its end. Inspired by this work, Kong and Muthukumar performed Langevin molecular dynamics simulations and Poisson-Nerst-Plank calculations to model the fluctuations of the linker inside the vestibule [], elucidating the dynamics of the single-molecule captures in the aqueous phase. Even if several other experiments worth to be referred, it is evident that the fundamental analyses just mentioned open the way to theoretical modeling and numerical investigation of an enormous amount of phenomena. Moreover, as Movileanu delineates [], nanopores represent single-molecule probes with several possible applications. They can be employed to reveal many features of important biomolecules like folding state, backbone flexibility, mechanical stability, binding affinity and charge distribution, in addition to the possibility of revealing the biophysical rules that govern the transport through transmembrane channels. However, the main idea that fostered the development of voltage-driven translocation experiments, since the very first application, has always been the one to identify and fast-sequence the molecule that is clogging the throat section of the channel. Although the original approach concerned only DNA/RNA strands, protein sequencing is currently deemed even more interesting than DNA characterization. Indeed, whereas several fast and inexpensive techniques to perform DNA read-out already exist, the achievement of information on protein structures and amino acid order through standard experimental approaches such as xray crystallography or nucleic magnetic resonance is still challenging and timeconsuming. However protein sequencing is even more complex than DNA for several reasons. The latter in fact is linear and homogeneously charged, differently from a structured polymer. Also, the ability to discern between different elementary molecules has to be superior for protein characterization. Indeed DNA or RNA are constituted by only five different types of oligonucleotides while the basic constituents of protein molecules are the standard amino acids.

Several groups have already investigated the interactions of small polypeptides, including α-helical [–] and β-barrel [] hairpin polypeptides, or even whole macromolecules [, , ] with the αHL or anthrax pores. A crucial step toward single-molecule differentiation and sequencing has been recently carried out by Talaga and Li []. They have distinguished from each other different proteins in solution by modeling the characteristic ion current drop across a solid state nanopore through a simple Ohmic term, multiplied by a corrective factor that takes into account the relative geometry of the protein and the channel. This approach will be probably delved deeper in the near future to take into account other physical and chemical aspects of the molecules and of the whole system.

As already stated, first experiences were carried out by exploiting biological pores but, as mentioned in the last example, recent advances in technology have opened new possibilities toward the use of artificial solid state nanopores.

A great advantage of protein biopores is that they can be chemically engineered with advanced molecular biology techniques, such as mutagenesis []. Using this approach a wide variety of α-hemolysin biosensors were developed, especially to probe binding affinities [, , ] and to reach discrimination at single-base resolution on isolated strands of DNA [, ]. Moreover, the geometry of a biopores is well-known and is highly reproducible, that is, once it has been determined by using, for example, x-ray crystallography, it can be assumed to be the same for all the biochannels of that particular type. Finally, toxin pores like αHL or anthrax [, , ] embed spontaneously in lipid bilayer membranes. However they exhibit a number of disadvantages too. Typically the pores, but mainly the membranes that host them, can become unstable if changes are produced in the external environment in terms of temperature, pH or denaturant concentration (used to unfold the proteins in solution). Also, it is very difficult to imagine portable devices which sensing area is made of biological pores, due to instability of, above all, the lipid bilayer. In addition, their fixed size can also represent a limit, because geometry and dimensions cannot be macroscopically changed to address specific issues.

Solid state nanopores present some advantages over their counterparts, such as increased reliability and lifetime, control over surface properties and pore geometry (diameter, length and shape), compatibility with existing semiconductor and microfluidic fabrication techniques with thus potential integration into ultra-high throughput devices [, ]. In fact, nanopore-based DNA analysis

Figure .: Transmission electron microscope images of a solid state nanopore. Adapted from [].

has the potential to carry out a range of laboratory and medical experiments even faster than current methods, also reducing the cost (fast DNA sequencing is an application of nanopore based sensing that took about years to transfer the early insights into a commercial device, announced by the end of []). Figure . shows TEM images of silicon nanopores. The first synthetic biosensor was developed by Charles Martin's group in and consisted of a single conical gold nanotube []. Since then, several teams have pursued this route, demonstrating the capability of solid state nanopores to perform single-molecule detection and to probe several characteristics of the proteins [, –].

Finally, as speculated by Branton et al. [], the possible embedding of a biological pore into a semiconductor membrane represents a fascinating opportunity that should be carefully considered and evaluated.

#### **.Outline of the thesis**

The thesis is organized as it follows. In chapter an overview on the state of the art of voltage-driven translocation experiments on proteins is provided, along with a summary of numerical investigations concerning coarse grained molecular dynamics. Specifically, the work by Oukhaled et. al [] is described, since it is the experimental analysis that has motivated a substantial part of the numerical simulations performed during the present research activity. Also, numerical analyses [, ] are outlined for can be deemed the reference works on the topic available at the beginning of the thesis. Chapter shows molecular dynamics results on protein stretching and translocation []. Chapter is dedicated to standard one-dimensional modeling of the translocation process, in terms of driven diffusion of a proper reaction coordinate in the associated potential of the mean force []. Main theoretical aspects are also considered along with the exposition of the numerical results obtained. Finally, the last chapter contains the three-dimensional continuum analysis performed to enrich the just mentioned standard view. The analysis is performed in light of the mechanics Coarse-grained molecular dynamics and continuum models

of complex materials [] and it is in particular inspired by two previous works [, ].

# **Chapter State of the Art**

# **.Preamble**

This chapter principally reviews the main experimental work that constitutes the baseline of our research activity, to which the whole section is dedicated.

However, a more general description of both experimental and numerical investigations is provided in section , where also a popular coarse grained model somewhat similar to the Go model is outlined. It is worth to anticipate ¯ that the Go model, sec. ¯ ., constitutes the framework selected for the numerical analyses presented in this thesis. Coarse grained models considerably reduce the computational time with respect to full atoms simulations, which are not in turn considered in the present setting since typical translocation time scales are yet not accessible, unless the pulling force is order of magnitudes greater than the biological ones [–]. In addition, the extrapolation of these results to the lower-force regime is a very difficult task. Moreover, the most recent and remarkable studies refer primarily to the analysis of folding/unfolding pathways [–]. The advantage of a reduced complexity with respect to full-atomistic techniques relies on the possibility of massive sampling of events, thus allowing a statistical mechanical description of translocation in terms of ensemble averages, as it will be shown in chapter .

Finally, the last section summarizes two numerical investigations of protein transport across nanopores that can be considered as the reference works in the minimalist molecular dynamics setting for protein translocation available in the literature at the beginning of the research activity.

# **.Experimental work: a selected example from the literature**

Stochastic biosensing has been exploited for detection of not only small molecules or DNA strands but also to study long and complex biopolymers. Since our interest is focused on protein experiments and modeling, all the wide literature on oligonucleotides, although extremely interesting, is not further considered.

The transport of a folded polypeptide is characterized by a greater energetic barrier than for an unfolded structure. According with this argument, a recent study combined single-molecule electrical measures with simulations based on Langevin dynamics to show that highly unfolded β-hairpin polypeptides enter the pore in a linear fashion, pursuing fast single-file translocations. In contrast the passage of structured β-hairpin polypeptides occurs more slowly, producing longer current blockades [].

Similarly, Oukhaled and co-workers [] studied the translocation of folded, partially folded and unfolded structures of the Maltose Binding protein (MBP) across the αHL channel (Fig. .). The unfolded structures were accomplished in high concentrations of guanidinium hydrocloride, a chemical denaturant commonly exploited in protein folding analyses. It is worth to disclose that this experimental work constitutes the starting point for the numerical investigations presented here. In this experiment, translocation has been studied by varying the denaturant concentration in the buffer solution, highlighting that current blockades were dependent on this parameter, both in terms of frequency and duration. The authors have observed, among other things, that folded proteins did not translocate across the biopore. The radius of gyration of the native MBP is indeed greater than the αHL lumen and the driving force was not strong enough to accomplish the protein unfolding at the pore mouth (the voltage required to unzip the fully native MBP would have destroyed the lipid bilayer of the experimental apparatus). Therefore no ion current drops were detected for low levels of denaturant in solution, panels a and c of Fig .. With the increasing of the guanidinium hydrocloride concentration, the frequency of translocation events increases, following a step-like denaturation curve. Also, the average duration of long blockade events reduces, up to reach the phenomenology depicted in panel f, where practically all the MBP molecules are unfolded in the bulk and accomplish translocation in a very short time 1ms. In between these states (fully native and fully denatured), complex phenomenologies occur: short and long current blockades coexist, testifying the presence of partially folded structures in solution, the characteristic transport time of which is order of magnitudes greater than the one of unfolded polymers, panels d and e. The mechanical nature of such bottlenecks of translocation has been deduced since stalls shortened in time when the applied voltage was increased. Moreover, the average translocation time decreases with the increasing of the denaturant concentration. The latter evidence excludes the possibility that the MBP stuck along the pore, due to chemical bonds, a phenomenon that could in principle generate long blockades but that is also thought to increase if the denaturation degree of the molecules

In nature, MBP is responsible for the uptake and efficient catabolism of maltodextrins in the well-known bacterium Escherichia coli, increasing the solubility of recombinant proteins when expressed as MBP-fusion proteins. It is a monomeric periplasmic globular protein with two lobes (the C and N lobes) that close to form the binding site. It is a single chain of amino acids with no disulphide bonds. Its molecular weight is approximately 42.5 ku.

Figure .: Adapted from []. Left side: Snapshots of the system and measurements. Panel a: Schematic representation of Maltose Binding protein and α-hemolysin without Gdm-HCl in solution. The current is constant at 100pA (open pore conductance) and the MBP can not enter into the pore. Panel b: With Gdm HCl 1.35M, well above the unfolding transition, 0.8M 1.0M , the current trace decreases down to 20pA when an MBP molecule is in the channel. Right side: Part of current traces generated by the passage of MBP through the α-hemolysin pore as a function of the Gdm-HCl concentration. Panel c: At Gdm HCl 0.8M MBP practically does not translocate across the channel and no current drops are detected. Only few short blockades are observed, their average duration is 0.2ms Panel d: For Gdm HCl 0.85M, long blockades of 100s in average alternate with a series of short ones of 0.2ms in average. Panel e: At Gdm HCl 0.9M, the frequency of the short blockades increases (of average duration 0.2ms). The frequency of the long blockades also increases, their distribution is wide, their mean duration decreases, its value based on the statistical analysis of all events is 40ms. Panel f: For Gdm HCl 1M, only short blockades (of average duration . ms) are observed, MBP molecules are almost completely unfolded. The frequency of short events does not further increase.

increases, for the hydrophobic chains of the residues can more easily interact with the pore walls.

As already stated, translocation of Maltose Binding protein through the αhemolysin pore is the baseline for the research activity performed. Numerical investigations have the primary aim to explore the complex phenomenology just described in order to point out the role of denaturation on the transport across nanochannels.

#### **.Additional past work outline**

The aim of the current section is to touch on additional numerical and experimental works that played a remarkable role during the research activity.

From an experimental perspective, voltage-driven translocation of polypeptides is more problematic than for DNA or RNA strands so that numerical and theoretical support is extremely important. The main difficulties arise from the fact that proteins, differently from nucleic acids, are not uniformly charged molecules and, in addition, their natural tendency to form compact conformations or aggregates usually interferes with the passage through narrow paths, generating consequently a rich (almost overwhelming) collection of non-trivial phenomena [, , , , , , ] (Fig. .). Despite the difficulties, several research groups have showed that voltage-driven

Figure .: Conformations of Maltose Binding Protein (MBP) pulled across a αhemolysin channel (αHL) computed via a reduced-model simulation. The pore is portrayed as a cylinder. The protein description is reduced to its Cα-backbone. The chain needs to unfold to translocate from the cis to the trans-side. The image was made using VMD software [].

translocation can be achieved, not only for nucleic acids, but also for different peptides and proteins [, , , , , –], and other studies show that this technique is able to yield substantial structural information [], to characterize protein charge [] and to discriminate different degrees of folding or aggregation []. However, except for some encouraging evidences that nanopores can detect site mutations on proteins [], it is uncertain whether current blockages convey the necessary information to resolve each of the amino acids in order to achieve protein sequencing.

However, the number of experimental efforts is quickly multiplying [, ] due to the obvious biological and biomedical advances expected in this research field [, , ]. Therefore from both a theoretical [–] and computational [, –] perspective, nanopore technology represents a challenging field to

develop methods able to interpret experimental results and to support the optimization and design of nanopore devices [–].

Sometimes simulations and theoretical methods allow bypassing the experimental limitations, such as restricted access to the finer details of the dynamics and irreproducibility of specific phenomena due to the instability of the organic matter out of physiological conditions. There are various theoretical strategies to tackle the transport of proteins across nanopores, starting from the microscopic description. Atomic-scale simulations are extremely informative about the protein dynamics for they take into account the more thorough structural and interaction details of the whole system: protein, nanopore, lipid bilayer and ionic solution [, ]. The high system complexity, however, prevents atomic-scale simulations from cumulating the necessary number of events for a meaningful statistical mechanical description, even for small translocating molecules. Simulations via coarse grained models of protein transport across nanochannels constitute an interesting alternative to full-atom methods. As already mentioned, translocation is practically always coupled with unfolding of the biomolecule. The simplest model to account for such a coupled process is to assume that both translocation and unfolding are accomplished by a constant mechanical force. Even this approach is computationally challenging and not devoid of drawbacks, because biologically or experimentally relevant time-scales are orders of magnitude greater than those reproducible by full atoms simulations and, in addition, the constant force assumption could not be realistic to describe the translocation in cells. In fact, typical simulations time scales range from nanoseconds to microseconds (whereas experimentally measured translocation events are of the order of milliseconds). Also, it has been found that repetitive pulling actually catalyzes protein import into mitochondria, suggesting that this could even be a strategy selected by evolution to import proteins into organelles []. Specifically, as suggested in [], the interplay among cyclic operation of molecular motors, force-induced backbone strain, and non-native interactions might be of essential importance in mitochondrial import and not only, as pulling/pushing of proteins by molecular motors is widely observed in the delivery of biomolecules across several organelle membranes.

A simplified phenomenological description of nanopores, proteins and interactions, allocating a reasonable amount of computational resources, provides an immediate inference on the average pathway of the process. Moreover, the limited amount of CPU loading allows a massive statistics of translocation events to be collected. Clearly, the major limitation of coarse grained models lies just in their phenomenological nature, because the non-trivial simplifications introduced have to be compensated by an extensive input of experimental information in order to improve the predictive power of the models and their ability to match experiments. Likely, a suitable strategy to investigate theoretically a complex and articulated phenomenology, as the one emerging from the fast-developing nanopore technology, would require an integrated approach that cleverly combines coarse-grained modeling with full-atom simulations. In this direction the most famous example of a coarse grained molecular model that directly derives from full-atom force fields (and therefore is much more complex that the approach here actually used) is the Martini model [, ].

A burgeoning development of coarse-grained approaches to heteropolymer modeling has taken place in recent years []. The basic feature of protein minimalist models is to group together and identify side-chain atoms, or even a complete residue, into simpler units (virtual atoms, beads). These models are simple, fast to implement and require relatively small computational resources as they take into account only the Cα-carbon backbone dynamics. In this context, the Sorenson Head-Gordon (SHG) model [] is an off-lattice model that generalizes a previous model introduced by Honeycutt and Thirumalai []. The α-carbons are represented by three possible types of beads: hydrophobic (B), hydrophilic (L), and neutral (N). The force responsible for the collapse onto a compact structure is the attraction between B-beads, whereas all other pairs of interactions are repulsive and determine the rearrangement of the folded structure onto the native topology. The long-range interactions between far apart residues in the sequence are modeled through the potential:

$$V\_{LR} = \sum\_{i,j \ge i+3}^{m} \epsilon S\_1 \left[ \left( \frac{\sigma}{r\_{ij}} \right)^{12} - 2S\_2 \left( \frac{\sigma}{r\_{ij}} \right)^6 \right] \tag{2.1}$$

where the summation ranges from 1 to the number m of residues, ϵ sets the energy scale and σ is the excluded volume size used to obtain a self-avoiding polymer chain. Also, rij r<sup>i</sup> r<sup>j</sup> is the distance between residues (beads)i and j. The attractive forces between hydrophobic beads is attained by setting S<sup>1</sup> S<sup>2</sup> 1 for BB pairs. The others are all repulsive: LL, LB are characterized by S<sup>1</sup> 1 3 and S<sup>2</sup> 1 and NB, NL by S<sup>1</sup> 1 and S<sup>2</sup> 0. In order to achieve the proper secondary structure, bending and dihedral interactions, which surrogate sidechain packing and hydrogen-bonding, are introduced. The analytic expression of the dihedral potential is

$$\begin{aligned} V\_{dil} &= \begin{aligned} \sum\_{i=1}^{m-3} \left[ A\_i \{ 1 + \cos \phi\_i \} + B\_i \{ 1 - \cos \phi\_i \} \right] \\ &+ C\_i \{ 1 + \cos 3\phi\_i \} + D\_i \{ 1 + \cos(\phi\_i + \pi/4) \} \end{aligned} \tag{2.2}$$

where φ<sup>i</sup> indicates the angle between the two adjacent planes identified by the positions of four consecutive beads. The information on secondary structures is stored in the coefficients A, B, C, and D that determine a bias on the angles reflecting the tendency of residues to form one of the secondary motives: helical (H), extended (E), turn (T). Therefore, the primary structure must be complemented with the auxiliary sequence of E, H or T symbols, assigning the appropriate set of coefficients. Such a decoupling between primary and dihedral sequence

allows a fine structural design []. The bending angle interaction is modeled as a harmonic potential:

$$V\_{bend} = \sum\_{i=1}^{m-2} \frac{k\_{\theta}}{2} (\theta\_i - \theta\_0)^2,\tag{2.3}$$

with a constant k<sup>θ</sup> 20ϵ<sup>2</sup> rad so that large deviations from the equilibrium value θ<sup>0</sup> 104.85 are unlikely. The SHG force field is completed by stiff springs with equilibrium distance d<sup>0</sup> 3.8 Å, namely

$$V\_{pept}(r\_{i,i+1}) = \sum\_{i=1}^{m-1} \frac{k}{2} (r\_{i,i+1} - d\_0)^2,\tag{2.4}$$

used to maintain chain connectivity by simulating the presence of covalent peptide bonds between successive amino acids. In the SHG model a selected number of elements is used to capture the essential characteristics of the proteins. However, some features, such as hydrogen bonding and side chains, are missing. These limitations should be compensated through a design strategy for optimizing the sequence []. SHG models have been adopted variously in studies on protein unfolding (in free and nano-confined geometries []) and translocation []. In the next chapter the coarse grained model (Go model) used to per- ¯ form the research activity here reported is considered, along with all the other assumptions employed. A critical comparison between the performance of the Go and SHG models in reproducing the folding of the small WW-do ¯ main form in the HPin protein can be found in [].

Huang, Kirmizialtin and Makarov [], using a SHG force field, compare the mechanical unfolding of Ubiquitin in free space (AFM-like stretching) and in a semi-infinite pore, and show that the two dynamics are quite different. Moreover, the authors observe that the unfolding pathway in the pore is strongly dependent on both pore diameter and pulling terminus. The free energy profile as a function of the position of the pulled amino acid along the pore axis presents several branches corresponding to 'unfolding intermediates' before a plateau is reached once the whole protein has entered the pore. As we shall see, similar intermediates are also observed as 'stall events' in the present research activity in a finite-size pore []. Due to its structural simplicity, Ubiquitin was used also in other studies, for instance Ammenti et al. [] simulated Ubiquitin translocation across a long pore using the Go force ¯ field. The latter two examples [, ] are further considered in the next section as are deemed to constitute the reference coarse grained computational studies available at the beginning of the research activity. A brief introduction is provided to delineate the basic mathematical

Atomic Force Microscopy

frameworks that are still missing to perform an albeit qualitative description of these works.

#### **.Analytical models and simulations of protein translocation**

#### **..Overview**

A common analytical approach to model complex phenomena such as the translocation process, is to investigate the low-dimensional potential of the mean force landscape, which is relevant to suitably-defined reaction coordinates. This approach is often coupled with simulations where simplified models for the protein and its surroundings are exploited, both to investigate the dynamics of the system and to reconstruct the free energy profile, which governs the trend of the potential of the mean force as it will be clear later on. As developed further in chapter , a reaction coordinate is just a collective order parameter that encompasses and summarizes multi-dimensional phenomena, usually between two end-states. The choice of the reaction coordinate is often guided just by intuition and aim of the analysis. In forced dynamics a natural selection is to choose the degree of freedom that couples with the driving force (usually it is the component along the force direction of the position vector of the pulled residue of the protein). For umbrella sampling applications [] to translocation, however, the center of mass of the molecule is often the preferred parameter []. For mechanical stretching, a situation mimicking AFM pulling studies, the protein extension along the direction of the applied force is often implemented as a reaction coordinate. When the free energy landscape enters in the formulation of the potential of the mean force, which is the natural (and sole) option for canonical systems such as a single molecule in a heat reservoir, an implicit assumption is always done: the translocation can be presumed to take place in thermal equilibrium at any location along the pore, so to obtain a quasi-equilibrium process. It is however possible that a protein, while inside the tunnel, would be trapped in a metastable state so that its behavior is practically non-ergodic. The quasiequilibrium assumption allows one to avoid dealing with the actual kinetics of the translocation, that can be traced back to a series of quasi-static states, as the ones obtained by sampling the system with the umbrella sampling method. In this context the free energy trend, G<sup>0</sup> z , (here the reaction coordinate is assumed to be just one for simplicity and it is indicated by z without loss of generality) is related to the probability p z to find the chosen reaction coordinate at z (a collection of 'equivalent' macro-states of the system correspond to such a value of the collective variable) by the following relation

$$G\_0(z) = -k\_b T \ln p(z),\tag{2.5}$$

where k<sup>b</sup> is the Boltzmann constant and T the temperature of the heat reservoir. When external fields act on the system, the associated reversible work W z has to be taken into account by including in the previous expression a proper term. For instance, in the case of the mechanical stretching induced by a constant force f, exemplified in the next paragraph, the potential of the mean force reads as

$$G\_f(z) = G\_0(z) + W(z) = -k\_b T \ln p(z) - fz \tag{2.6}$$

(and now it is evident the reason why the free energy profile governs the potential of the mean force trend). Since continuum approaches are delved deeper later on in the thesis, the topic is not further considered in the present section. It has had the only aim to introduce the idea underneath the D continuum approach related to the already mentioned numerical investigations [, ]. Such a description is carried out in the next section.

#### **.. Coarse-grained molecular dynamics reference studies**

Huang, Kirmizialtin and Makarov [] performed an analysis relevant to protein import into mitochondria using Langevin dynamics simulations (see (.b)) of a coarse-grained off-lattice model, the already introduced SGH model, to investigate the co-translocation unfolding of a protein structure (the amino acids long Ubiquitin), pulled mechanically through a semi-infinite narrow pore, cylindrical in shape. It is emphasized that, despite the computational savings, due to the minimalist assumptions, biologically relevant time scales associated with barrier-crossing events are rarely accessible via direct simulations. In this work the potential of the mean force just introduced is exploited, i.e. equation (.), and the chosen reaction coordinate coincides with the z-axis coordinate of the pulled residue (the z-axis coincides in turn with the axis of symmetry of the pore). Thus the translocation is assumed to be slow enough with respect to the protein internal dynamics to be described as a one-dimensional driven diffusion along the pore axis in the potential G<sup>f</sup> z . To obtain the global shape of the free energy G<sup>0</sup> z , the umbrella sampling/weighted histogram method [] is used.

In [] the free energy profile resulting from translocation is also compared with the one stemming from mechanical stretching (AFM-like experiments) simulations and differences are discussed. It is in fact proven that significant differences exist between these two cases and that, in particular, AFM experiment outcomes cannot be easily scaled to match translocation results as each process occurs following distinct pathways. For details on amino acid interactions see section . Without entering the details of the calculations, Figure . panel a shows the profile of the potential of the mean force in the mechanical stretching case, for different values of the force (here the reaction coordinate is the molecule extension along the force direction). In this figure, contact maps are also depicted. These are the plots of the pairs

### Coarse-grained molecular dynamics and continuum models

of residues i, j such that their distance, in both the native and current configurations, is lower than (in this analysis) 7.5Å and 9.125Å respectively. Also, j i 3. Such residues interact following the long range potential (.). In panel b the same situation is showed for the translocation process when the N-terminus is pulled inside the pore. In the article [] the pulling from the C-terminus is also studied . While the overall free energy cost of extending the

Figure .: Adapted from []. Panel a: Free energy profile of Ubiquitin in units of ϵ<sup>h</sup> as a function of the mechanical stretching reaction coordinate for different values of the force. The minima correspond to the native-like state and unfolding intermediates and , which structure is depicted in the lower part of the figure, where also contact maps are showed. Panel b: Translocation of Ubiquitin driven by a force applied to its N-terminus. The potential of the mean force is plotted as a function of the position of the reaction coordinate, corresponding to the just mentioned end of the molecule. Native-like and intermediate structures - are shown with the relevant contact maps.

protein is similar, the profiles of G<sup>f</sup> z and, consequently, the force-induced unfolding mechanisms are different. The local minima correspond to unfolding intermediates that are obviously distinct. Certain similarities exist between mechanical stretching and translocation pathways, especially in the first step,

The N-terminus refers to the start of a protein or polypeptide terminated by an amino acid with a free amine group (-NH). The convention for writing peptide sequences is to put the N-terminus on the left and write the sequence from N- to C-terminus. When the protein is translated from messenger RNA, it is created from N-terminus to C-terminus. The latter is in turn the end of an amino acid chain (protein or polypeptide), terminated by a free carboxyl group (-COOH)

transition from structure (native-like) to structure , which is present in both cases and concerns the same event (separation of two terminal parallel strands).

In this study the dependence of the translocation time on the driving force is also examined, in addition to pore-size effects. Actually only a crude estimate of the effect of the driving force on the blockage time is carried out, as the authors assume that the average translocation velocity should be correlated to the overall free energy barrier, namely ∆G f maxG<sup>f</sup> z minG<sup>f</sup> z , encountered along the reaction coordinate. The unfolding free energy barrier does not depend linearly on the force, as it is often conjectured in mechanical stretching studies. The pore investigation is not considered here for brevity; it is just worth mentioning that if the pore is wide enough, the protein can perform transloca-

Figure .: Adapted from []. Translocation of Ubiquitin across a large pore. The reaction coordinate used to project the free energy profile is the position of the last amino acid of the chain along the pore, the N-terminus. Structures representative of the molecule as encountered during simulations are also depicted with their contact maps.

tion in a semi-folded fashion, for example in the way showed in Figure ..

As stated by the authors themselves, several limitations may prevent the direct comparison of the numerical results with the experimental ones. These are subsequently listed:


Coarse-grained molecular dynamics and continuum models


For completeness, it is worth mentioning a work where translocation of Ubiquitin is studied both in its thermodynamics and average kinetics []. In this analysis a pore of finite length is considered, thus tackling the issue, although it is long enough to accommodate the unfolded protein, i.e. if some residues still dwell on the cis-side, none has yet escaped the channel from the trans-side aperture. In [] statistical analyses are also carried out, accounting for blockade time distribution and translocation probability, by following the analytical method just sketched, in terms of first passage time statistics (again this perspective will be delved deeper further on in the thesis, chapter sec. .). In the remaining part of this section the main points of [] are summarized. The model employed for the protein is a Go-like one, see sec. ¯ . of chapter , while the pore confinement is again described by a cylindrical soft-core repulsive potential. The Langevin equation of motion still governs the simulation dynamics and umbrella sampling is performed by constraining the center of mass of the molecule with a harmonic potential. Multiple weighed histogram technique is use to deweight the umbrella sampling results [].

Translocation simulations are run until the protein is fully expelled out of the right end of the channel and almost complete refolding takes place. However, at low forces, Ubiquitin may fail to cross the pore within the simulation time window. The probability of translocation, P<sup>T</sup><sup>r</sup> , is estimated as the number of translocation success over the total number of runs as a function of the driving force and of temperature. Figure ., panel a, shows the curves of P<sup>T</sup><sup>r</sup> for two different values of the temperature, reporting a step-like trend, which delineates a clear value of the critical force, i.e the force for which P<sup>T</sup><sup>r</sup> 1 2. For instance, the decreasing of the critical force with temperature is a tangible consequence of the lowering of the free energy translocation barriers with temperature increase. Panel b in the same figure contains the trend of the average translocation time as a function of the pulling regime and for the two values of the temperature simulated. Only in the high force regime the average translocation time shows an Arrhenius-like dependence on the force, i.e. τ τ<sup>0</sup> exp βF L , with β 1 kbT , F the pulling force and L the pore length.

Figure .: Adapted from []. Panel a: Ubiquitin translocation probability as a function of the pulling force for two values of temperature. Pore length 300 Å, pore radius 4 Å, f<sup>u</sup> 6 pN, Tref 198 K and Tph 305 K. Panel b: Average translocation time τ of Ubiquitin as a function of the force F at the two simulated temperatures. t<sup>u</sup> 0.25 ps.

Characteristic trends of the free energy (indicated as G x ), obtained from umbrella sampling simulations and multiple weighted histogram analysis method, are reported in Figure .. These values refer to the position of the center of mass of the protein along the pore axis. The plateau indicates that once the protein is unfolded and completely inside the channel, it can slide along its axis without further free energy increase. This is a consequence of the pore geometry, for it is long enough to fully host the protein even when completely unfolded. In contrast, as expected, the main variations of the free energy occur at pore boundaries. In the proximity of the right end, when actually some of the residues are already located outside the pore, the molecule starts to be spontaneously expelled.

Figure .: Adapted from []. Free energy trend obtained from umbrella sampling simulations with a harmonic potential as umbrella potential in units of ϵ. The two curves refer to the two simulated temperatures (reference and physiological).

Coarse-grained molecular dynamics and continuum models

However, despite the insights gained, the general relationship between the structure of the proteins and their resistance to mechanically driven co-translocation unfolding remains poorly understood.

# **Chapter Molecular Dynamics Translocation Simulations**

#### **.Preamble**

A set of translocation simulations and the description of the relevant numerical tools are reported here. The attention is focused on driven transport of Maltose Binding protein (MBP) across nano-channels in the framework of coarse-grained modeling.

As already mentioned, this work is motivated by a recent experiment on the MBP voltage-driven transport across nanopores which explores the influence of denaturation on translocation pathways []. Specifically, short and long channel blockades, associated with the translocation of denaturated and partially folded MBP conformations respectively, are identified by large differences in the ion current drop durations.

Dynamically, translocation of proteins strongly depends on the denaturation degree and generally is coupled with an unfolding stage. In fact a folded protein with gyration radius larger than the pore narrower section needs a complete or partial unfolding to start the translocation [, ].

Structurally, MBP is a monomeric globular protein with 370 residues, resolved through x-ray crystallography by Quiocho, Spurlino and Rodseth []. Mechanical AFM pulling experiments by Bertz and Rief [] identified some structural regions - named unfoldons by the authors - as resistant areas to mechanical denaturation. The domains M, M, M and M, shown in Figure .A on the PDB structure (PDB-ID: mbp), can be related (one can expect such a thing) to the long blockade events in the MBP translocation. In computations we consider for MBP the Go-like model described in section ¯ ., as a natural approach to assess the impact of the molecule structural properties on translocation. The advantage of a coarse-grained description with respect to atomic scale models relies on the possibility to explore a large number of denaturation and pulling conditions, accumulating this way robust statistics of translocation events. For recent applications of coarse-grained models to various biochemical processes see the review [].

As first step we show that the model is able to reproduce the general features of MBP chemical denaturation. Afterward we study the translocation process in a pore reproducing the average dimensions of the αHL channel. We find the translocation dynamics to be strongly affected by the protein denaturation state,

Figure .: Adapted from []. A) Maltose Binding Protein structure with the four unfoldons highlighted. M-darkest gray (residues: -), M-light gray (residues: - ), M-dark gray (residues: -), M-lighter gray (residues: -). Panel B reports the positions of the five stall-points found in our translocation runs (residues , , , , ) in a D-topological view.

similarly to the experimental evidence. In particular, the translocation of chemically unfolded MBP conformations requires relatively low forces and once the pulled terminus enters the pore, the transport proceeds uniformly. In contrast, native-like structures exhibit a richer phenomenology: stronger forces are required to trigger the transport that, once started, developsin a stick-and-slip fashion, through bottlenecks and jerky movements caused by the rearrangements of the folded part of the protein that has not yet engaged the pore. In this case the issue is to identify the MBP structural motives responsible for the translocation slowing down. First we exclude these stalling stages to be related with the unfoldons, by showing that translocation pathways significantly differ from the free space unfolding dynamics. Then, by means of an analysis of static and dynamic native contact maps, we show that the stall points of the translocation pathway are mainly due to the protein regions more dense in long range native interactions. This result might constitute a possible criterion to predict a-priori some statistical features of protein translocation from a simple static structural analysis.

#### **.Three-dimensional models for MBP simulations**

In order to simulate numerically translocation phenomena, a coarse-grained computational model for the Maltose Binding protein and its surroundings is developed (Fig. .). As it is described below, choices are made in terms of the importing mechanism, pore and particle interaction potentials, numerical integration scheme and equation of motion for the material points constituting the

system. Finally, the approaches followed in the practical implementation of the numerical simulations are also expounded.

# **..MBP numerical model: The Go-like approach ¯**

The phenomenological off-grid model proposed by Nobuhiro-Go and ¯ Harold A. Scheraga (minimalist off-lattice native-centric C<sup>α</sup> Go-model) [ ¯ , ] is a coarse-grained model that identifies all the amino acids with their C<sup>α</sup> carbon atoms (Fig. . panel c). Therefore, the protein is reduced to a sequence

Figure .: Panel a: Maltose Binding protein appearance according to a full atom representation. Panel b: Secondary structure of the MBP evinced from its protein data bank file, PDB-ID: mbp. Panel c: Amino acid core structure. The characteristic residue chain of this elementary constituent is summarized by a box surrounding a capital R. The C<sup>α</sup> carbon atom binds directly with the residue chain. Panel d: MBP coarse grained representation. Accordingly to the Go model, only the ¯ C<sup>α</sup> carbon atoms along the main backbone chain of the protein are depicted.

of material points (also called beads) coinciding spatially with the C<sup>α</sup> atoms of the protein backbone and no side chain characteristics are retained (panel d of the same figure). The atomic mass is the same for all the residues and equal to 1 in the so-called 'code measure units' or 'internal units'.

A lot of variances and refinements of this model are available in the current literature (see e.g. [, –]). All these approaches are generally referred as Go-like models. Here the approach presented in [ ¯ ] is implemented.

The minimum energy is assigned to a particular reference configuration, usually coincident with the native structure of the considered protein. This is why the Go model can be properly applied only when the crystallized st ¯ ructure is already known and the folding dynamics are mainly determined by the topology of the natured conformation, as it happens with small globular proteins. The Go-like model generates an almost ideal folding pathway (ener ¯ getic traps associated with metastable conformations able to hamper the folding process are practically absent). Also, the native configurations so obtained are stable and close to the reference one (in terms of root mean square deviation []), which can be extracted, for example, from the protein data bank. It is a particularly suitable approach to simulate two-state folding dynamics, but it has been used also to predict intermediate configurations [].

Thanks to its properties and to its simplicity the Go model has been widely ex- ¯ ploited for protein numerical analysis [, , –]. The main limitation of the approach is the requirement that the native tertiary structure of the protein has to be known. In addition, its topological nature does not account for chemical factors, determining a considerable disagreement between real and simulated time scales: kinetic peculiarities of the folding process are not properly predicted. These discrepancies do not alter the folding thermodynamics and the correct sequence of events. In most cases the model allows the correct assessment of the free energy values and of folding pathways. Moreover, transition states are coherent with the ones experimentally determined.

In the remaining part of this section the practical implementation of the model is presented. Following the Go-like approach selected, the potential acting ¯ on the residues of the MBP is constituted by four parts.


Let ri, r<sup>0</sup> <sup>i</sup> (with i 1, ..., m) be the position vectors of the m residues identified by their C<sup>α</sup> carbon atoms, in the reference (native) and current configurations, respectively. The peptide potential, responsible for the covalent bonds between the beads of the polymer chain, has the following representation:

$$V\_p(r\_{i,i+1}) = \frac{k\_p}{2}(r\_{i,i+1} - r\_{i,i+1}^0)^2,\tag{3.1}$$

Figure .: Panel a: Schematic view of the peptide bond. Panel b: Sketch of the bending angle interaction. Panel c: Representation of the twist angle formed by the two planes determined by four consecutive material points of the protein structure (three by three).

where ri,i <sup>1</sup> r<sup>i</sup> r<sup>i</sup> <sup>1</sup>, r<sup>0</sup> i,i <sup>1</sup> <sup>r</sup><sup>0</sup> <sup>i</sup> <sup>r</sup><sup>0</sup> <sup>i</sup> <sup>1</sup> are the position vectors (their norm is the bond length) between the bonded residuesi and i 1 in the instantaneous and native configurations, respectively (panel a of Fig. .). Moreover, k<sup>p</sup> 1000ϵ d<sup>2</sup> 0 is the empirical constant in terms of the parameters d<sup>0</sup> and ϵ named, respectively, the equilibrium length and the unit of energy. In particular, d<sup>0</sup> 3.8Å is the average distance between two adjacent amino acids and ϵ sets the energy scale of the model. In summary, the bond potential is nothing more than a simple harmonic potential.

The angular bending potential V<sup>θ</sup> plays a fundamental role in recovering the secondary structure of the reference native configuration of the protein. It is mathematically equivalent to the peptide potential, where relative displacements are substituted by angular differences, namely

$$V\_{\theta}(\theta\_{i}) = \frac{1}{2}k\_{\theta}(\theta\_{i} - \theta\_{i}^{0})^{2},\tag{3.2}$$

where <sup>k</sup><sup>θ</sup> <sup>20</sup><sup>ϵ</sup> rad <sup>2</sup> is the elastic constant expressed in terms of the energy computational unit and θi, θ<sup>0</sup> <sup>i</sup> are the bond angles formed by three adjacent beads in the simulated (time-dependent) and native conformations, respectively (Fig. .b).

An additional contribution to the composition of the secondary structure is the dihedral potential, expressed as a function of the twist angles φ<sup>i</sup> and φ<sup>0</sup> <sup>i</sup> , again referred respectively to the actual and crystallized conformations. The twist angle is the angle formed between the two planes determined by four consecutive amino acids (three by three) along the chain, panel c. The definition of the twist angle potential V<sup>φ</sup> is

$$V\_{\phi}(\phi\_i) = k\_{\phi}^{\{1\}} \left[ 1 - \cos(\phi\_i - \phi\_i^0) \right] + k\_{\phi}^{\{3\}} \left[ 1 - \cos 3(\phi\_i - \phi\_i^0) \right],\tag{3.3}$$

where k <sup>1</sup> <sup>φ</sup> <sup>ϵ</sup> and <sup>k</sup> <sup>3</sup> <sup>φ</sup> ϵ 2 are the dihedral constants.

In the Go model a distinction is made among the pairs of residues that i ¯ nteract following a potential that has also an attractive part, in addiction to a repulsive Coarse-grained molecular dynamics and continuum models

one, set to take into account excluded volume effects, and the ones that, instead, only repulse each other sterically. The distinction is made on the basis of the native distances with respect to a parameter of the model, the so-called 'cut-off radius', Rc. If the native distance between two (i and j) non-adjacent residues, i.e. j i 3, is lower than the cut-off distance, they are considered to be in native contact and will interact following a - Lennard-Jones potential (.). In other words, such residues will interact attractively if brought to a distance greater than the native one and repulsively otherwise:

$$V\_{nat}(r\_{ij}) = \epsilon \left[ 5 \left( \frac{r\_{ij}^0}{r\_{ij}} \right)^{12} - 6 \left( \frac{r\_{ij}^0}{r\_{ij}} \right)^{10} \right],\tag{3.4}$$

where all the symbols have been already defined. When r<sup>0</sup> ij Rc, the purely repulsive contribution Vnnat is assigned to the pair of amino acids considered. This term is based on excluded volume effects. It enhances the cooperativity in the folding process and takes the form of a Lennard-Jones barrier

$$V\_{nmat} \left( r\_{ij} \right) = \frac{10}{3} \epsilon \left( \frac{\sigma}{r\_{ij}} \right)^{12},\tag{3.5}$$

where σ is, similarly to the SHG model, a free length parameter correlated with the extension of the excluded volume (self-avoiding polymer chain). It is usually set equal to the average dimension of the amino acids, approximately 4.5Å. The total potential acting on all the residues of the protein is then:

$$V\_{G\bar{o}}(r\_{ij}) = \sum\_{i=1}^{m-1} V\_p(r\_{i,i+1}) + \sum\_{i=1}^{m-2} V\_{\theta}(\theta\_i) + \sum\_{i=1}^{m-3} V\_{\phi}(\phi\_i) + \sum\_{i,j>i+3} V\_{nb}(r\_{ij}),\tag{3.6}$$

where the non-bonded term Vnb summarizes the possible long range interactions just described and reads as

$$V\_{nb}(r\_{ij}) = \epsilon \left\{ \begin{array}{ll} V\_{nat} = \left[ 5 \left( \frac{r\_{ij}^{0}}{r\_{ij}} \right)^{12} - 6 \left( \frac{r\_{ij}^{0}}{r\_{ij}} \right)^{10} \right] & r\_{ij}^{0} < R\_c, \\\ V\_{nnat} = \frac{10}{3} \left( \frac{\sigma}{r\_{ij}} \right)^{12} & r\_{ij}^{0} > R\_c. \end{array} \right.$$

In what follows ϵ 1 and R<sup>c</sup> varies in the range 3.0 7.5 Å.

#### **..Pore and pulling models**

The confinement effect on a protein driven inside a narrow space such as a nanopore can be represented by a step-like soft-core repulsive cylindrical po-

Figure .: A step-like 0 L cylindrical soft-core repulsive potential V<sup>c</sup> is chosen to model the effect of the confinement inside the pore. A constant force is used to drive the MBP inside the channel. It is always applied to the foremost residue inside the pore.

tential. For simplicity the cylinder axis of symmetry is set coincident with the x-axis of the frame of reference used for translocation simulations.

The same direction is used to develop the mechanical pulling of the MBP by the application of a constant force to the foremost residue inside the channel. The constant action is thought to constitute a reasonable approximation for the average effect of the electrical potential in a voltage-driven translocation experiment.

The channel potential is given in (.) and schematic representations of the system, the potential and the pulling mechanism, are shown in Figure ..

$$V\_c(x, y, z) = V\_0 \left(\frac{\rho}{R\_p}\right)^{2q} \Theta[x(L - x)].\tag{3.7}$$

Here Θ s 1 tanh αs 2 is a smooth step-like function limiting the action of the pore potential in the effective region 0, L . L and R<sup>p</sup> are pore length and radius respectively. Also, ρ y<sup>2</sup> z<sup>2</sup> is the radial coordinate. A convenient choice of the other parameters is q 1, α 3Å <sup>2</sup> and V<sup>0</sup> 2ϵ []. The driving force acts only in the region in front of the pore mouth (the capture one) x 2, 0 , and inside the channel 0, L . Pore length L 100Å and radius R<sup>p</sup> 10Å are taken from αHL structural data.

#### **.. Langevin dynamics for the material points of the MBP**

The choice for the dynamics of the beads is not different from what is usually employed in coarse-grained molecular dynamics simulations for it is the well-known Langevin equation (.a) that governs the motion of the material points. Consequently, numerical investigations are performed at constant temperature. The overdamped limit is actually implemented, i.e. r¨ 0, (.b); a standard Verlet algorithm is used as numerical scheme for time integration []. In other words, any particle of the system follows a dynamics enforced by this three-dimensional equation, when a numerical simulation runs:

$$
\ddot{r\_i} = -\gamma \dot{r\_i} - \nabla\_{r\_i} V(r\_i) + \Gamma(t) + F\_{ext}, \tag{3.8a}
$$

$$\dot{r\_i} = \frac{-\nabla\_{r\_i} V(r\_i) + \Gamma(t) + F\_{ext}}{\gamma}.\tag{3.8b}$$

Here γ is the friction coefficient used to keep the temperature constant (also referred to as Langevin thermostat) and Γ t takes into account thermal fluctuations, being a delta-correlated stationary and standard Gaussian process (white noise). Practically, the latter quantity represents a random force that, along with the friction term, satisfies the fluctuation-dissipation theorem (namely the mean-square value of Γ is proportional to the corresponding friction coefficient, γ). V r<sup>i</sup> is the total potential on the material points (generally speaking it includes both the terms belonging to the Go-like model ¯ and to the potential used to model the pore confinement (.) and (.), respectively, depending on the position r<sup>i</sup> of the issued bead) and Fext takes into account the possible contribution of the external forces (in the present setting it coincides with the force driving the translocation, which acts only on the pulled residue).

Several analytical models can be implemented starting from a Langevin approach but, since this topic is developed when D-continuum modeling is considered, it is not further discussed here.

#### **.Methods for translocation and stretching simulations**

The following sections describe the practical implementations of the numerical investigations in the cases of mechanical pulling in a confined environment (translocation simulations) and in free space (AFM-like mechanical stretching simulations). Thermalization runs are also outlined when necessary as they have been used to obtain statistically independent initial configurations.

#### **..Translocation simulations**

For each cut-off radius Rc, thermalization simulations are performed at T 0.75 θ<sup>u</sup> in the already mentioned 'code measure units' (symbols will be usually dropped in what follows) to generate initial MBP configurations with one of the two ends constrained near the pore entrance. More in general, in a thermalization (or equilibration) simulation the protein is left free to fluctuate and possibly to unfold (depending on the value of the cut-off radius) in the heat reservoir. Protein configurations are sampled at time intervals equal to 10% of the simulation time window, T<sup>w</sup> 10<sup>5</sup>t<sup>u</sup> with t<sup>u</sup> time units, to ensure statistical independence of initial states. After thermalization, the protein is driven by applying the pulling force F in the capture region and into the pore.

Protein translocation is considered accomplished when the last residue leaves the trans-side of the channel. As it will be shown in the result section, there are conditions where the molecule escapes the capture by diffusion, despite the action of the importing force. To save computational time, simulations are stopped and discarded if the protein reaches a distance from the pore entrance comparable to its native state linear size (approximately 20Å). This criterion is based on a preliminary estimate of a negligible probability for the molecule to re-approach the pore by diffusion and re-start the translocation.

To set the nomenclature up, such cases of protein escape will be labeled as loss events. Cases where the protein is neither translocated nor lost at the end of the time window are indicated as unsuccess.

#### **.. Stretching simulations**

Before each stretching simulation (AFM-like), the protein is equilibrated without constraints or additional forces in order to obtain a proper initial configuration. Stretching is simulated by using a constant velocity Steered Molecular Dynamics (SMD) strategy [] where protein elongation is induced by a spring of elastic constant k 0.1ϵ. The protein N-terminus is held fixed, the C-terminus is attached to the first end of the spring, which second end is dragged at constant velocity v in the direction of the initial end-to-end vector. To test the resulting robustness of simulation protocols, we use different steering velocities and perform AFM-like stretching also from the N-terminus (with the C-terminus blocked).

As mentioned in the introduction, MBP contains four unfoldons (M, M, M and M, Fig. .). To monitor the denaturation degree of the k-th unfoldon domain, we consider the number of its internal active native contacts, normalized to the corresponding value in the PDB crystallized structure, as a function of time. Two residues originally in contact are considered detached when their distance in the actual molecule conformation exceeds 1.23r<sup>0</sup> ij . The latter criterion is based on a negligible attractive force beyond such a distance, which approximately coincides with the point where the - Lennard-Jones potential (.) changes curvature.

#### **.Translocation and stretching numerical results**

#### **..Denaturation characterization**

In the Go-model force ¯ field (.) a decrement in the cut-off radius R<sup>c</sup> plays virtually the rôle of a chemical denaturing agent because, by reducing the number of attractive long-range interactions (.), the native state is destabilized with respect to the action of thermal fluctuations. Denaturation can be also achieved by increasing the temperature or decreasing the energy scale ϵ (thermal denaturation). The latter approach would be however too global to be compared to specific experiments [, ], because it would soften both bonded and nonbonded interactions (see eqs. (.), (.)). The denaturation we implement in

Figure .: Panel a: Contour map of the RMSD as a function of the cut-off radius and simulation temperature. Darker regions reflect states were the equilibrated protein structure resembles the native configuration. The transition from folded to unfolded state is a twostate transition and is extremely sharp, in compliance with the experimental findings on MBP denaturation. Panel b: Denaturation plot of MBP obtained by thermal equilibrium simulations at T 0.75. The proper rescaling, (.), of RMSD makes simulation data to collapse onto Ganesh et al. denaturation plot [] (fraction of unfolded structures by circular dichroism spectroscopy).

such a minimal model context is the closest to the chemical one, as it reduces only the number of attractive non-bonded interactions responsible for the collapse onto compact native structures. That behavior mimics the chaotropic action of a solute (denaturant) which destabilizes the native states by competing with internal non-covalent protein interactions, such as hydrogen bonding and hydrophobic effect.

We thus expect that laboratory denaturation conditions can be effectively taken into account by a suitable choice of the cut-off radius. In fact panel a of Figure . depicts a contour plot of the Kabsch distance (mean root-square deviation of a structure with respect to the native state []) as a function of the simulation temperature and cut-off radius. The transition from the region where MBP structures are folded (black) to the denatured one (white) is sharp, thus pointing out the suitability of the selected approach to model chemical denaturation. Experimentally, Ganesh, Shah, Swaminathan, Surolia and Varadarajan have analyzed chemical MBP denaturation by guanidine hydrochloride (Gnd-HCl) at constant temperature, T 28 C. They have shown that the unfolded protein concentration ρu, at different GndHCl concentrations D , estimated via circular dichroism spectroscopy, could be fitted by means of a standard two-

state model (dashed line in Fig. . panel b). The latter Figure compares the denaturation of Go-model MBP as a function of the cut-o ¯ ff radius Rc, with the denaturation curve in []. As a nativeness indicator for the numerical data we employ the already mentioned RMSD. Mapping between simulated and experimental denaturation results is established after a base-line subtraction of the RMSD values and a normalization to the 0, 1 interval. It reads

$$\left[D\right] = \frac{a}{R\_c - R\_c^0} - \frac{a}{R\_c^\* - R\_c^0},\tag{3.9}$$

where our reference interaction cut-off R<sup>c</sup> 7.5Å corresponds to the fully native state D 0 in the experiment; R<sup>0</sup> <sup>c</sup> 4.0Å is in turn the cut-off radius for complete denaturation and a is a tunable parameter adjusted to achieve datacollapse.

#### **..Translocation dynamics**

We characterize translocation success (top left panel of Fig. .) in terms of translocation probability PTr. The fraction of translocated simulations is considered with respect to the total number of attempts, as a function of the importing force F. Figure . refers to T 0.75 and R<sup>c</sup> 6.8Å that, as shown in Figure .b, is the lowest Rc-value corresponding to native-like conformations, for both N-pulling (empty symbols) and C-pulling (filled symbols). In addition to PTr, the left side of Figure ., i.e. panels b and c, provides also the fraction of loss events, PL, and unsuccessful events P<sup>U</sup> 1 P<sup>L</sup> PTr. P<sup>L</sup> is the percentage of molecules that escape the capture by diffusion in the bulk. P<sup>U</sup> is the fraction of proteins that are neither translocated norlost within the time window Tw. The step-like shape of PTr vs. F allows a clear definition of the critical force F<sup>c</sup> as the value for which PTr F<sup>c</sup> 1 2. If F Fc, most of not-translocated runs are lost, since the importing force is so weak to be easily overcome by thermal motions. By increasing F, P<sup>L</sup> rapidly decays whereas the number of translocated and unsuccessful runs increases. For F Fc, almost all the untranslocated proteins correspond to unsuccessful runs and very few are lost. The overall scenario also indicates an asymmetric translocation process of folded MBP, which depends on the pulling terminus. The difference in F<sup>c</sup> and in the number of proteins stuck in the pore between N- and C-pulling suggests that the transport pathways are different in the two cases.

To characterize the role of denaturation in the translocation mechanism, we estimate F<sup>c</sup> at different cut-off radii. Panel d of Figure . shows that translocation at low denaturation (large Rc) requires high forces. Decreasing Rc, the critical force is reduced and below R<sup>c</sup> 6.8Å it reaches a plateau. Actually, once denaturated (R<sup>c</sup> 6.8Å, see Fig. .), the dynamics of random coil MBP configurations becomes unaffected by a further reduction of Rc. In contrast, as for R<sup>c</sup> 6.8Å the MBP structures become more and more compact and stable, the

Figure .: Here and in the following figures units are internal, i.e. related to the coarsegrained model we used. Distances, when present, are in Å. Empty symbols N-pulling, filled symbols C-pulling. Left hand side: Statistics of translocation as a function of the importing force F at R<sup>c</sup> 6.8Å and T 0.75: (a) translocation probability PTr, (b) loss event probability P<sup>L</sup> and unsuccessful run probability P<sup>U</sup> (c). Right hand side: Critical force F<sup>c</sup> as a function of R<sup>c</sup> for T 0.75, panel d. Loss and unsuccess events probability, P<sup>L</sup> and P<sup>U</sup> respectively, at critical force, panel e.

increment of the critical F<sup>c</sup> is naturally expected, due to a stronger resistance to unfolding. Therefore, the critical force is a quantity able to discriminate folded and unfolded MBP structures, in analogy to the applied voltage in experiments, (see [, , ]).

A further difference between denaturated and native-like MBP translocation is revealed by the amount of loss and unsuccessful events at critical force. Panel e of Figure . shows a clean transition at R<sup>c</sup> 6.8Å in both P<sup>L</sup> and P<sup>U</sup> at critical force. At smaller R<sup>c</sup> (denaturated state) P<sup>L</sup> 0.5, hence none of the unstructured chains gets stuck in the pore. That behavior is compatible with the idea of an importing force competing with thermal fluctuations to insert a residue in the pore: F<sup>c</sup> Fth kbT d<sup>0</sup> 0.2, with d<sup>0</sup> 3.8Å the average distance between consecutive residues. Once the first core of residues is imported, denaturated structures oppose a weak resistance and their translocation is easily accomplished. In contrast at larger Rc, the critical force is significantly greater than Fth. Hence, almost all MBP configurations start getting imported and those unable to finalize the transport within the allotted time end up stuck into the channel. At low denaturation, long pore blockades are entirely due to the structural resistance of folded MBPs to mechanical unfolding.

#### **..Residence time statistics**

Figure . reports the average translocation (or blockade) time τ<sup>b</sup> as a function of the importing force for different values of the interaction cut-off Rc, for both the N-terminus and C-terminus pulling (panel a and b respectively). For a

Figure .: Panel a: Average translocation time on the sub-ensemble of successfully translocated runs τ<sup>b</sup> vs. importing force F. The different symbols concern different cut-off radii, namely R<sup>c</sup> 3.0, 5.0, 6.2, 6.5, 6.8, 7.0, 7, 2 and 7.5Å are represented by upward triangles, downward triangles, squares, circles, diamonds, first-quadrant-filled circles, second-quadrant-filled circles and pentagons, respectively. The dashed line represents the fit (.). Data for N-pulling simulations. Panel b: Average translocation time on the sub-ensemble of fully translocated simulations τ<sup>b</sup> vs. importing force F. Different symbols represent different cut-off radii, namely R<sup>c</sup> 3.0, 5.0, 6.2, 6.5, 6.8, 7.0 Å and 7.5Å, squares, circles, downward triangles, upward triangles, cross, diamonds, stars respectively. The dashed line represents the fit, (.). Data for C-pulling simulations. Panels c and d: translocation time distribution for R<sup>c</sup> 6.5Å and F 0.575 (denaturated) and R<sup>c</sup> 6.8Å and F 1.10 (native) respectively. The dashed lines in panel c and d are an inverse Gaussian (.), and a double exponential fit (.).

given cut-off radius, τ<sup>b</sup> generally decreases with increasing F, as it is expected. However for R<sup>c</sup> 6.5Å, the τ<sup>b</sup> vs. F data collapse onto a single curve (dashed), confirming that MBP structures are completely denaturated and react similarly to the importing force. The curve is well fitted by a three-parameter (τ0, F0, µ0) relation, namely

$$
\tau\_b(F) = \tau\_0 \text{ e }^{\bullet F/F\_0} + \frac{L}{\mu\_0 F},
\tag{3.10}
$$

which is a linear combination of an activation term exp F F<sup>0</sup> , and a constant velocity drift 1 F. The origin of the activation term can be explained as in what follows. An irreversible translocation can occur only after a stable core of residues is established inside the channel. This sort of capture process requires the overcoming of the initial entropic barrier due to the strong confinement of the chain induced by the channel, that even unstructured polypeptides experience []. For R<sup>c</sup> 6.8Å, the average translocation time grows with the decrement of F (crosses, diamonds, stars in Fig. .a and b), indicating that translocation is drastically slowed down. At large enough forces, overwhelming the stability of the native-like MBP, τ<sup>b</sup> tends to collapse on (.), regardless the value of the cut-off radius. The curve depicted by (.) thus constitutes a sort of base-line for all translocation times. In low-force regimes, the bending of data toward the horizontal line τ<sup>b</sup> T<sup>w</sup> is an effect due to the time window finiteness. In Figure .a and b, that effect has been partially corrected by using the formula

$$
\tau\_b = \frac{1}{P\_{\rm Tr} + P\_{\rm U}} \left( P\_{\rm Tr} \,\,\tau\_{\rm Tr} + P\_{\rm U} \,\, T\_w \right), \tag{3.11}
$$

where τTr is the average time from the translocated runs and PTr is the corresponding probability (see Fig. .). The use of (.) can be justified. Let us consider a set of M runs, where M<sup>T</sup> of them translocate within Tw, while M<sup>U</sup> do not accomplish the passage but are expected to translocate in a time greater than T<sup>w</sup> and M<sup>L</sup> are lost (will practically never translocate), obviously M M<sup>T</sup> M<sup>L</sup> MU. By definition, after discarding lost events, the average translocation time is

$$\tau\_b = \frac{1}{M\_\mathrm{T} + M\_\mathrm{U}} \sum\_{r=1}^{M\_\mathrm{T} + M\_\mathrm{U}} t\_r,$$

where t<sup>r</sup> is the translocation time of the r-th run (thought for T<sup>w</sup> ). We could rewrite τ<sup>b</sup> by separating the runs for which t<sup>r</sup> T<sup>w</sup> from the runs for which t<sup>r</sup> T<sup>w</sup> and, consequently, split the summation into

$$\tau\_b = \frac{M\_\mathrm{T}}{M\_\mathrm{T} + M\_\mathrm{U}} \frac{1}{M\_\mathrm{T}} \sum\_{r=1}^{M\_\mathrm{T}} t\_r' + \frac{M\_\mathrm{U}}{M\_\mathrm{T} + M\_\mathrm{U}} \frac{1}{M\_\mathrm{U}} \sum\_{r=1}^{M\_\mathrm{U}} t\_r''.$$

Here, the first average involves the terms t<sup>r</sup> Tw, while the second one the terms t<sup>r</sup> Tw. By considering that P<sup>U</sup> M<sup>U</sup> M and PTr M<sup>T</sup> M, we get the expression

$$
\tau\_b = \frac{1}{P\_{\rm Tr} + P\_{\rm U}} \left( P\_{\rm Tr} \,\tau\_{\rm Tr} + P\_{\rm U} \, T\_w \right),
$$

where we have used the lower bound for t<sup>r</sup> T<sup>w</sup> and the formula τTr 1 MT MT <sup>r</sup> <sup>1</sup> tr.

Panel c of Figure . shows a typical translocation time distribution ψ t of a denaturated MBP. It looks well localized around its average τ<sup>b</sup> and it is properly fitted by an inverse Gaussian, with parameters D0, µ0:

$$\psi(t) = \frac{L}{\sqrt{4\pi D\_0 t^3}} \exp\left\{-\frac{\{L - \mu\_0 F t\}^2}{4D\_0 t}\right\},\tag{3.12}$$

that is the first-passage time distribution of biased random walkers on the interval , L emitted in 0 and absorbed at and L []. An instance of the time distribution ψ t of native-like MBP translocation is plotted in panel d. In this case, ψ t has a fatter large-time tail and it is well reproduced by a double exponential:

$$\psi(t) = \frac{k\_1 k\_2}{k\_2 - k\_1} \left[ e^{\blacksquare \epsilon\_1 (t \to t\_0)} - e^{\blacksquare \epsilon\_2 (t \to t\_0)} \right],\tag{3.13}$$

with rates k1, k<sup>2</sup> and t<sup>0</sup> being an offset time interpreted as the time taken by a denaturated MBP to cross the pore. As we shall see below, the double exponential is justified by the presence of two successive stall points, corresponding to two energy barriers, the overcoming of which can be seen as an activated processes. Translocation phenomenology is better characterized by addressing the

Figure .: Time evolution of the number of residues on the cis-side of the pore Ncis for a pulling run at R<sup>c</sup> 6.8Å and critical force (panel A: C-terminus pulling, panel B: N-terminus pulling). Snapshots of representative configurations are reported for the plateaus corresponding to the protein blockades. The average time spent by the protein in different Ncis, at critical force, is reported in panels C, D and E for R<sup>c</sup> 6.8Å C-pulling, R<sup>c</sup> 6.8Å N-pulling, R<sup>c</sup> 6.5Å C-pulling, respectively.

time evolution of the number of residues Ncis on the cis-side of the pore. Panel A of Figure . shows Ncis t for one successful run at critical force for R<sup>c</sup> 6.8Å, pulled from the C-terminus. Clearly, most part of the time isspent by the MBP in two particular stalling states, Ncis 335 (C-St) and Ncis 267 (C-St), corresponding to configurations where either residue 335 or residue 267 are respectively located at the pore entrance. Representative snapshots of the two states are included in the figure for illustration. To gain more quantitative information we compute the time t<sup>r</sup> Ncis spent by the r-th run in a given Ncis state during translocation. Time t ¯ Ncis , defined as the average of t<sup>r</sup> Ncis over the ensemble of translocated runs, is plotted in panel C of Figure ., for C-pulling translocations at critical force and R<sup>c</sup> 6.8Å. The two peaks in the histogram correspond to the two blockade events shown in panel A. Clearly, translocation is far from being uniform and looks more as a step-like (or stick and slip) process, where the protein has to overcome successive free-energy barriers associated with specific structural rearrangements. The analysis of N-pulling (same cut-off radius and corresponding critical force), panels B and D in Figure ., confirms the overall picture: also in this case the protein spends most part of the time in specific configurations. The stalling events are three and take place at different positions, namely Ncis 363 (N-St), corresponding to residue 7 at the pore mouth, Ncis 307 (N-St, residue at the mouth) and Ncis 256 (residue , N-St). To complete the phenomenology, we note that the transport of denaturated proteins (R<sup>c</sup> 6.8Å) appears much more uniform (peak-less), panel E of Figure ..

The scenario for both N and C-pulling is robust under changes of both R<sup>c</sup> (provided that R<sup>c</sup> 6.8Å, i.e. native-like MBP) and pulling force range F, with only slight differences in the peak intensity (Fig. .). Concerning the unsuccessful translocations, we find that the already mentioned stall points still occur (Fig. . and Fig. .), just to confirm their leading role in transport bottlenecks. In summary, panel B of Figure .sketches in a D-topological view the positions of the five stall points detected for structured protein translocation runs (residues , , , , ).

The loss events can provide insights into the translocation process and might be related to experimental observations. A typical event observed in loss simulations is that the protein spends some time at the pore entrance, attempting to insert the terminus, before definitely escaping the capture by diffusion. The duration of the events in which the number of residues inside the pore is greater than 0 is well determined in lost simulations and defines a sort of protein-pore interaction time t<sup>I</sup> . Moreover, reasonably assuming that in between two successive events the protein remains close enough to the pore to generate some kind of blockage, it is possible to attempt a closer connection between simulation results and experimental findings. Panel A of Figure . depicts the interaction time distribution obtained for a denaturated protein while panel B shows the trend of the average interaction time τ<sup>I</sup> as a function of the denaturation degree (as usual characterized by the Go-model interaction cut-o ¯ ff Rc) for different importing force intensities. A common behavior of unfolded structures is evident, while the average interaction time for compact molecules results greater and dependent on the cut-off radius. Although a quantitative comparison might

Figure .: Average time spent by the protein as a function of Ncis. Panels of the first row refer to translocated simulations (left), and unsuccessful runs (right), both for Cpulling at critical force, R<sup>c</sup> 7.2Å. The latter panel clearly shows the presence of the two main blockades (C-St and C-St), which can be detected also for successful simulations, thus confirming the robustness of the data shown in panel C of Figure . with respect to changes in Rc. In the second row, analogous results are reported for N-pulling translocations: three main peaks are evident (N-St, N-St and N-St). Finally, in the last row the same data for successful simulations for R<sup>c</sup> 6.5 Å (left) and R<sup>c</sup> 6.2Å (right), both for N-pulling at critical force are depicted. For R<sup>c</sup> 6.5Å the peak structure still partially holds (two on three peaks are still detectable) and it is necessary to decrease the value of the cut-off radius to 6.2Å to recover a homogeneous behavior equivalent to the one shown in panel E of Fig ., where the pulling is from the C-terminus.

be too ambitious, the scenario we observe in the simulations closely resembles the presence of the 'bumping events' reported in the experiments by Mestorf et al. [] and classified as short-time low-intensity current blockades. In practice, the protein moves close to the pore partly blocking the current but does not start a translocation. The bumping time distribution results to be well fitted by an exponential (Fig. .A) and its average is about one order of magnitude lower than the corresponding average translocation time, a fact qualitatively in agreement with the experimental data (for the numerical specific case here considered τ<sup>b</sup> τ<sup>I</sup> 50).

#### **.. Stretching vs. translocation**

The previous picture of MBP translocation rises the natural question as to why the transport of initially compact structures becomes temporarily stalled at well defined stages. We relate these 'rate limiting steps' of translocation to

Figure .: Average time spent by the protein as a function of Ncis for unsuccessful runs for R<sup>c</sup> 6.8 Å. Left panel refers to C-pulling, right panel to N-pulling. The two peaks in the left panel and the three in the right one correspond to the peaks in panels C and D of Fig ., respectively.

Figure .: Panel A: Protein-pore interaction time (t<sup>I</sup> ) distribution and associated exponential fit for unfolded protein and small pulling force. Panel B: Trend of average interference time as a function of the pulling force and cut-off radius. Unfolded structures show a common behavior at short τ<sup>I</sup> while a greater average interaction time is associated to compact structures. The vertical dashed line marks the Rc-threshold separating in the Go-model approach native-like from fully denaturated conformatio ¯ ns. Different symbols correspond to four increasing force intensities (in the order: squares, circles, up and down triangles).

the MBP structural properties that govern the free-space stretching. As mentioned in the preamble of this chapter, targeted AFM experiments [] have shown that MBP mechanical stretching occurs via a sequence of events corresponding to the successive breakdown of specific domains, termed unfoldons. The analogy of pulled translocation with AFM mechanical stretching suggests that unfoldons might be involved also in MBP translocation bottlenecks. Preliminarily, we check, via standard Steered Molecular Dynamics stretching protocol [] that the Go-model of the MBP is able to reproduce the 'unfoldon ¯ picture' observed in the experiments []. In Figure .A the average over 50 runs of the force-extension (end-to-end distance Ree) curve of the MBP in the numerical stretching experiment is depicted. The three branches, denoted by M, M+M and M according to the nomenclature used in [], separated by the three worm-like-chain curves [], identify the same unfoldon opening sequence described in the reference experimental work, including the coupled un-

Figure .: Right: force (A) and average value of fraction of native contacts Q<sup>s</sup> Mk k 1, 4 (B) vs. the end to end distance Ree in stretching simulation at pulling velocity v 0.05, temperature T 0.5 and cut-off radius R<sup>c</sup> 7.5Å. Dotted lines represent worm-like chain fits []. Left: average value of Q<sup>s</sup> Mk as a function of Ncis for R<sup>c</sup> 6.8 and T 0.75 for N-pulling (C) and C-pulling (D) translocated runs at critical force. Vertical black lines mark the residues mainly involved in the translocation bottlenecks.

folding of M and M. Also, the stretching force required to perform constant velocity detachment follows a peculiar trend qualitatively in compliance with the experimental evidence (see Fig. . for a quick summary of the latter). The picture becomes clear when plotting the average fraction of the active native contacts Q<sup>s</sup> M k , k 1,..., 4 in each unfoldon as a function of the end-to-end distance Ree of the MBP. For instance, in the initial stage of the elongation, corresponding to the curve branch labeled by M, the structural parameter Qs(M), dash-dotted line, decreases indicating the rôle of M detachment in this stage. The simultaneous opening of M and M, solid and dashed line respectively, is also evident in the second branch. Finally M opens. Snapshots of the detachment sequence are shown in Figure ..

To understand the role of the unfoldons in the MBP translocation process, first we locate the five blockage-points with respect to the unfoldons in the protein structure (Fig. .). Then, again, we plot the unfoldon structural parameters Q<sup>s</sup> M k as a function of Ncis for C-pulled and N-pulled translocations, respectively (Fig. .C and D). In the C-terminus pulling case (Fig. .C), the sequence of unfoldon opening is essentially the same as in the mechanical stretching. First, M breaks down, followed by M and M, which dynamics is again correlated, and finally the M-opening concludes the process. The vertical dotted lines in Figure .C and D highlight the two stall points C-St, C-St in the C-pulling and the three other ones, namely N-St, N-St, N-St, in the N-pulling (see discussion of Fig. .). A very weak correlation emerges between unfoldon dynamics and blockades. Specifically, the two configurations reported in panel A of Figure . (see also Fig. .B and the vertical lines in Fig. .C and D) show that the first blockage-point for C-pulling (residue ) takes places in unfoldon M (darkest gray in the figure), while the second one (residue ) lies in the initial part of M (light gray). In the pulling from the N-terminus, we observe

Figure .: Adapted from []. Characteristic force-extension trace of a ddFLN--MBP chimera []. Unfolding of the native protein and subsequent elongation of the polymer is shown in the darkest gray (unfoldon M), the breakdown of M is shown in light gray, the detachment of M is shown in dark gray and the unfolding of M is shown in the lightest gray. Contour length increases are marked on worm-like-chain (WLC) curves fit to the data. At extensions 125nm the characteristic unfolding pattern of three ddFLN scaffold domains is visible (gray, leftmost and rightmost ends), followed by detachment of the protein. Inset: Superposition of several MBP unfolding traces zoomed into the region of the short-lived intermediate state M (dark gray) resulting from the detachment of M (light gray). A best WLC fit to the curves is shown in black. All light gray levels (lifetime of the intermediate state M) fall on the same WLC curve, indicating the breakdown of a well-defined state.

that M is the first to be broken followed by M, then M and finally M. Again the blockade-points do not appear directly related to the unfoldon boundaries, panel B of Figure ., with the sole exception of the stall N-St, located at the boundary between M and M.

#### **.Discussion**

An essential result of our analysis is a sharp change in the translocation dynamics upon varying the cut-off radius from R<sup>c</sup> 6.5Å to R<sup>c</sup> 6.8Å. We recall that, according to (.), a mapping between R<sup>c</sup> and the MBP denaturation degree can be established (Fig. .).

**Denaturated-MBP translocation** (R<sup>c</sup> 6.5Å). Below R<sup>c</sup> 6.5Å, the MBP behaves as a random coil: its translocation dynamics is basically independent of R<sup>c</sup> as revealed by the critical force behavior (Fig. .d) that remains almost constant for R<sup>c</sup> 6.5Å and by the blockage time data (Fig. .a and b) that

Figure .: Snapshots of MBP unfolding at T 0.5, R<sup>c</sup> 7.5 Å and v 0.001. The unfolding sequence is the one described by Bertz and Rief, []. The first part to open is the unfoldon M (darkest gray) as shown in panel B, then M and M unfold (light and dark gray), panels C, D and E. With the exception of the initial portion of M, the last structure to detach is M (lighest gray), panels F and G. In panel, A the initial folded MBP conformation is depicted.

collapse onto the same curve for every Rc, thus exhibiting a common behavior, (.). In this Rc-range, not only the critical force F<sup>c</sup> but also the probability of lost events at F<sup>c</sup> shows a plateau (Fig. .e). Moreover, the pulling terminus (C or N) does not influence significantly any of these quantities. The sole barrier in the translocation dynamics is due to the pore entrance; once the protein is captured, the average time spent at different stages of the process is similar (panel E of Fig. .). Thus, random-coil MBP translocation can be interpreted as a capture (activated) stage followed by an elementary first passage process, where random walkers under a constant bias are injected from the cis-side of the channel and absorbed to the trans-side.

**Native-like MBP translocation** (R<sup>c</sup> 6.8Å). Folded structures exhibit a much more complex dynamics, characterized by a transport getting stalled in the pore due to specific MBP configurations. Translocation times are greater than the unfolded case. Their average increases with R<sup>c</sup> (Fig. .a and b) and their distributions show fat tails (Fig. .d). These results agree with the experiments described in [], where, in the explored forcing regime, only denaturated proteins were able to rapidly translocate, whereas very long blockades pertained to partially folded ones. The capability of numerical techniques to explore the dynamical details evidences the presence of stable and reproducible bottlenecks of the transport, namely C-St, C-St, N-St, N-St, N-St in native-like MBP transport (Figs. ., . and .). Natural candidates for such stall points are the boundaries between MBP unfoldons. However, our results clearly indicates that the AFM-unfolding dynamics is not correlated to the stalls.

Rather, the analysis of MBP native-contact maps allows us to shed light on the actual mechanism underlying the stalls. Indeed, the blocking C-St and C-St, but also their counterparts for the N-pull case, can be interpreted in terms of a specific sequence of native-contact breaking an MBP sub-domain undertakes in order to engage the pore in an almost linear conformation. In fact, the stalling domains of the protein are those with the greater amount of external native contacts, where external means 'excluding all the inner contacts of the domain'. In formulas, if K is the domain, then its contact density reads

$$\rho\_c(K) = \frac{1}{m\_K} \sum\_{i \in K} \sum\_{j \notin K} M\_{ij},\tag{3.14}$$

where m<sup>K</sup> denotes the number of residues in K and Mij 1, if i and j form a native contact, and Mij 0 otherwise. In Figure . the contact map of native MBP, obtained for R<sup>c</sup> 6.8Å is considered. The lower side panel shows the average of external contacts composed by the ten-residue-long consecutive regions of the MBP. Peaks identify the protein regions with the greater number of contacts that are thus the most probable candidates to cause the stall points once engaging the pore entrance. However, by considering the dynamics data (like the configurations associated with the stalls, Fig. .C for example), we can be more precise by selecting the two critical regions around the peaks C-St and C-St: C-K = 328, 338 and C-K = 260, 270 , respectively, and the complementary regions X = 339, 370 , X = 271, 327 and X = 1, 259 , so to patch the whole molecular structure. Table .summarizes the density of external contacts (see (.)) that these regions form with the part of the molecule running from the proper boundary of the issued region to the free terminus (a generalization of this approach concerning only a static analysis of contact maps - without resorting to dynamical data - is provided in the next chapter). For instance, ex-

Figure .: Contact map of native MBP obtained for R<sup>c</sup> 6.8Å. Bottom panel depicts the average number of long range contacts each residue establishes. The peaks identify the regions of the chains with the larger density of external contacts, which are responsible for putative stall points. Vertical gray bands highlight the stalling regions as obtained by the analysis relevant to Figure .. These regions form a significant number of contacts with distal parts of the molecule and only when the contact cluster at the pore cis-side mouth is broken the translocation can proceed further on, up to the next bottleneck (next cluster).

ternal contacts pertaining to C-K bond this domain with residues from to (N-Terminus), as here C-terminus pulling blockades are exemplified. This approach allows tracking only the effective external contacts that come into play when a region is facing the pore, namely the contacts formed with just that part of the molecule which has not yet engaged the pore (backward contacts). The higher values pertain to C-K and C-K sub-domains. They confirm the picture emerging from Figure .. These areas, being with high contact density, oppose the maximal resistance to unfolding once in front of the pore. In other words, translocation bottlenecks are determined by those sub-domains that, still partially folded when approaching the pore cis-side, carry with themselves other distal regions of the molecule tightly bonded to by native interactions. The contact analysis leads to the same conclusion also for the N-terminus pulling, where the critical regions are N-K = 1, 11 , N-K = 55, 63 , N-K = 105, 115 .


Coarse-grained molecular dynamics and continuum models

Table .: Contact density of MBP subdomains described in the text. Col.: domain nomenclature, C-K, C-K and N-K, N-K, N-K stand for critical blocks in C- and N-pulling respectively, X, X, X and Y, Y and Y are the complementary regions. Col.: residues involved in each sub-domain. Col.: density of external native contacts as defined in (.). Critical blocks have a contact density larger than complementary regions.

#### **.Remarks**

We have simulated Maltose Binding protein translocation across αHL nanopore via a coarse-grained computational model for both the MBP and the pore. As the channel is narrow, translocation properties strongly depend on the denaturation state of the MBP. In the issued Go-like model, molecule ¯ denaturation is controlled by the parameter R<sup>c</sup> determining the number of native attractive interactions. In the region 6.5 R<sup>c</sup> 6.8 Å, a transition is observed from random-coil MBP (denaturated) to native-like structures. The transition emerges from both equilibrium (Fig. .) and transport simulations (Fig. .). In particular, translocation of denaturated MBP is almost uniform and consists of a capture stage followed by a simple driven diffusion process. The passage in the channel of folded MBP is more critical and interesting, it looks like a stick-slip dynamics characterized by constant-velocity transport broken by stalling events in the channel (Fig. .). For instance, the C-terminus translocation occurs via two long stall events resulting in a double-exponential tail-behavior of the translocation time distribution (Fig. .d). This behavior is presumably associated with the presence of two subsequent free-energy barriers that MBP has to overcome in order to complete the passage. Our analysis also shows that stall events are related to the MBP regions with high density of native external contacts. Thus, in principle, long blockade events and stall points can be predicted by looking directly at the MBP PDB-structure. In contrast, a weak correlation is found between stall points and unfoldons, the structural blocks through which the MBP reacts to mechanical stretching in free space. This result is a strong indication that, despite the analogy between pulled translocation and mechanical unfolding, the pathways gathered from mechanical pulling are not sufficient to make inferences on translocation

mechanisms. The action of the pore, indeed, drastically modifies the unfolding pathway during the transport with respect to a free pulling process.

# **Chapter -Dimensional Continuum Models**

### **.Preamble**

In order to interpret numerical results it is appropriate to built suited analytical models of the physical and chemical processes that occur in real translocations and to define proper statistical observables to grasp and summarize the essence of the phenomenon.

The first section of the present chapter is dedicated to an overview of the essential ideas behind low-dimensional continuum models used in the literature to summarize both experimental and three-dimensional molecular dynamics results.

Section highlights the practical implementation of the specific approach performed to obtain the free energy profile for the translocation simulation protocol described in chapter . Therefore, umbrella sampling simulations and multiple weighted histogram analysis method on coarse-grained molecular dynamics (MD) simulations are outlined []. The proteins under scrutiny are again described by the Go-like model already introduced, which accounts accurately fo ¯ r the influence of the native structures on both folding and translocation pathways [, , , ]. Finally, section expounds the achieved results and presents a comparison with the D numerical investigations. Also, since a mutant version of the MBP (termed MalE) and an additional protein, the Endoglucanase CelA (PDB-ID:ah), have been considered in this perspective, relevant outcomes are provided as well.

# **.Overview**

#### **..The general framework**

Experimental results highlight the presence of energy barriers that the molecules have to overcome to accomplish translocation []. Also, empirical distributions of blockade times show tail asymmetry typical of phenomena where the motion is due to diffusion and drift effects [, ]. However, even for a coarse-grained description of the protein dynamics, the energy landscape is still too high-dimensional to allow a concise representation of the process. It is hence convenient to project the system evolution over relatively few effective coordinates, Q1,...,Q<sup>k</sup> (termed reaction coordinates or collective variables), functions of the positions of the amino acids constituting the molecule []. A translocation coordinate should enable us to parametrize the walk of the protein along the pore. Since the evolution of each Q<sup>j</sup> is expected to undergo random fluctuations, we are interested in the probability, P Q1,...,Qk, t , for the system to visit a state Q1,...,Q<sup>k</sup> at time t, given an initial condition P Q1,...,Qk, 0 . Two main questions arise: i) What are the most representative reaction coordinates Q<sup>j</sup> in order to properly describe a translocation process? ii) What is the equation governing the evolution of P?

In biopolymer translocation, a set of reaction coordinates that we can consider appropriate should smoothly trace without ambiguity the pathways between the two states (cis and trans) with respect to the pore ends. The common cylindrical symmetry of the narrow pores suggests to project the chain dynamics along the channel axis (x-axis without loss of generality) and consider nugatory the transversal motion (y,z). Hence, the kinetics can be reasonably parametrized by a single reaction coordinate Q, expected to be a function of the x-coordinates of the residues only. The choice of Q is not a trivial task, there is neither a unique option nor a general rule, and usually an appropriate choice depends on the specific problem. For instance, in the case of a small protein pulled through a semi-infinite [] or long enough [] pore, the natural reaction coordinate can be either the x-position of the pulled residue or the x-component of the center of mass. Other authors found it reasonable to introduce, as reaction coordinate, the number of monomers in a given region of the space [, , ].

The derivation of an evolution equation for P, starting from the equations of motion of the full system, is generally problematic. Moreover, in general, the dynamics of Q in time is a stochastic process. A simplified approach is based on the assumption that the evolution of the collective variable is the slowest in the system, with a strong separation from other time scales, supposed irrelevant. Translocation is then considered completed when the stochastic process Q t crosses for the first time a threshold value Qth, corresponding to the protein exiting the pore from the trans-side. This point of view clearly defines a first-passage problem in a time window Tw, with exit time tout defined by

$$t\_{\rm out} = \min\_{t} \left\{ 0 \le t \le T\_w \mid Q(t) = Q\_{\rm th} \cdot \right\}$$

In this framework, the translocation phenomenology can be conveniently recast in the motion of an effective particle at place Q, undergoing a driven diffusion in a potential of mean force V Q []. Here, V Q, t G Q W Q and G Q is the free-energy profile in the variable Q of the unperturbed system, while W Q is the work done by the importing force. In a first approximation,

we can presume that the probability P Q, t obeys the Smoluchowski equation

$$\frac{\partial P}{\partial t} + \frac{\partial J}{\partial Q} = 0,\tag{4.1}$$

$$J(Q, t) = -\mu\_0 P \frac{dV(Q)}{dQ} - D\_0 \frac{\partial P}{\partial Q},\tag{4.2}$$

where µ<sup>0</sup> is the effective mobility coefficient and D<sup>0</sup> the effective diffusion one. At equilibrium, they satisfy the fluctuation-dissipation relation: µ<sup>0</sup> D<sup>0</sup> kbT . In (.), the contributions of the fast degrees of freedom are collected in a single diffusion term ∂ D0∂P ∂Q ∂Q. The Smoluchowski equation is satisfied by the probability to find the reaction coordinate at place Q, at time t. Such an equation pertains, thus, to a population of random walkers undergoing an overdamped Langevin driven diffusion in the potential of the mean force V(Q). As far as the results stemming from this approach are concerned (sec. ), (.) has not been solved directly, rather we have found it more convenient to solve several D Langevin runs from (.), an approach that results equivalent to solving the Smoluchowski equation.

$$\begin{aligned} \dot{Q}(x) &= \frac{-\nabla\_x V(Q(x)) + \Gamma(t)}{\gamma} \\ V(Q(x)) &= G(Q(x)) + W(Q(x)) \end{aligned} \tag{4.3}$$

In stochastic-process first-passage theory (see []), the basic quantity is the survival probability S t , given by

$$S(t) = \int\_{Q\_0}^{Q\_{th}} dQ \, P(Q, t),\tag{4.4}$$

that estimates the number of systems for which Q is still in the interval Q0, Qth as a function of time, corresponding to the protein still in the channel. Here Q<sup>0</sup> is the value of Q at the pore entrance. Hence, 1 S t amounts to the probability to be outside the pore at time t. Therefore its derivative

$$\psi(t) \coloneqq -\frac{dS(t)}{dt} = \int\_{Q\_0}^{Q\_{\rm th}} dQ \, \frac{\partial P(Q, t)}{\partial t}.$$

determines the distribution ψ t of the exit times. Thus, by using the Smoluchowski (.), we get:

$$
\psi(t) = J(Q\_{\text{th}}, t) - J(Q\_0, t), \tag{4.5}
$$

namely the exit-time distribution is fully determined by the values of the probability current J Q, t at the boundaries (pore ends). Common choices for a generic boundary condition Q<sup>B</sup> can be the following ones: absorbing P QB, t 0, reflecting J QB, t 0 or mixed J QB, t KBP QB, t . The last one seems to be more realistic for describing translocation processes, as real pore ends cannot be considered neither perfectly reflecting nor perfectly absorbing. The solution P Q, t to (.) cannot be found in general for an arbitrary potential. However, even without solving the Smoluchowski equation, the exact analytic expression can be found for the average translocation time τ<sup>b</sup> F,L and probability PTr F,L as a function of the importing force F and the pore length L [, , ]. The theoretical prediction of ψ t requires instead a complete closed solution. Therefore, only simple cases can be worked out or approximated expressions can be obtained [, , , ] and, in specific conditions, we can reproduce both simulation and experimental data. For instance, the inverse Gaussian function (.) is the first-passage time distribution of a diffusion process in the interval , L , starting from x 0, with absorbing boundaries in and L, and driven by a constant drift µ0F []. Despite its simple origin, the approach is able to match observed translocation time distributions under specific conditions [].

In [] the phenomenology of the transport appears more complicated than a simple driven diffusion, but the mild shape of the free-energy (Fig. .) permits a simple interpretation of the translocation kinetics in terms of a single jump over an extended barrier followed by a diffusion driven by a constant force. The theoretical predictions in [] were corroborated by translocation time histograms (Fig. .) which are well fitted by first-passage-time distributions ψ t , obtained by numerically solving the Smoluchowski equation (.) in the spatial coordinate x in 0, L . Only a constant-force drift has been used in [] under mixed boundary conditions, namely J 0, t K0P 0, t , J L, t KLP L, t , where K<sup>0</sup> and K<sup>L</sup> have been considered adjustable parameters.

It appears then necessary to calculate the free energy profile of the whole process to take into account the barriers that hamper the crossing. A quite common approach for similar issues refers to the umbrella sampling method [], a computational technique used to improve sampling of systems which phase space is dived into subdomains separated by high energy jumps. The umbrella sampling technique is based on the choice of a fictitious potential to overcome the energy constraints so to explore even the most unlikely areas of the phase space. The thermodynamics properties calculated in this way need to be deweighted from the influence of the artificial potential and recombined in order to provide the unbiased probability (free-energy). This purpose can be achieved, for example, by employing the weighted histogram analysis method (WHAM) [, ].

Figure .: Adapted from []. Probability distribution of translocation times for Ubiquitin translocation in a L 300Å pore []. The dashed line is an inverse Gaussian fit (.) of the histogram while the continuous line is the results obtained solving (.) in the domain 0, L with V x from Fig. ., constant pulling force and radiative boundary conditions.

#### **.Models and methods**

For the analyses presented in this chapter, an additional protein, the socalled Endoglucanase Cela from bacillus Agaradherans [] (PDB-ID:ah) is considered in addition to a mutated version of the MBP. The latter is called MalE as it is mutated on residues and . The mutation is obtained by simply quenching the native contacts belonging to the issued beads. Thermalization and translocation simulations are performed for both MalE and AH. The former ones are addressed to establish the stability of protein structures, the latter ones aim to further study the transport through the model pore described in section ., which is tailored to allow single-file translocation only. The translocation simulation protocol has been already described in the previous chapter and it is adopted for both MalE and AH transport runs. Simulations are performed at critical force Fc, i.e. the load intensity for which the translocation probability is 50%. This condition is believed the optimal compromise between statistical accuracy and the slow translocation speed needed to both amplify the stall duration and possibly obtain a quasi-equilibrium process. Results are not presented so much in depth as for the MBP because they do not add crucial information. However, features such as the translocation stalls will be shown along with the exposition of the D-continuum approach results. Such results stem from the achievement of the free energy profile in a meaningful reaction coordinate Q by umbrella sampling simulations and WHAM method []. Also, the free energy profile G Q is used in the formulation of the potential of the mean force to perform a set of one-dimensional Langevin runs in order to asses the suitability of this approach to interpret, retain and summarize the essential features of the D translocation dynamics, the one in (.).

The umbrella potential Vumb reads

$$V\_{umb}(Q(x)) = \frac{1}{2}k\_u \left(Q(x) - Q\_0\right)^2. \tag{4.6}$$

The translocation process is parametrized in terms of Q Nright Nlef t, with Nright and Nlef t the number of residues outside the pore on its right and left side respectively (Fig. .A). Translocation is thought to proceed from left to right when the pulled terminus is the C-terminus and from right to left for Npulled translocations. In other words, the cis-side of the pore coincides with the left side in the former case and with the right side in the latter one (the transside accordingly). This nomenclature is used consistently throughout, with Cpulling and N-pulling experiments proceeding, for example for the MBP, from Q 370 to Q 370 and from Q 370 to Q 370, respectively (as already stated, the MBP counts residues, i.e. Q 370 370 ). The associated color code used in the figures is light gray for N-pulling and dark gray for C-pulling. The collective variable Q can be expressed as a function of the x-axis coordinates of the residues in the following way:

$$Q(x) = \frac{1}{2} \sum\_{i=1}^{m} \left[ \tanh(ax\_i) + \tanh[a(x\_i - L)] \right],\tag{4.7}$$

with m the number of residues of the protein, Lthe pore length and a 3.0Å <sup>1</sup> a smoothing parameter. Equation (.) is a smooth version of the discrete variable Q Nright Nlef t. In order to compute the free energy profile, a set of umbrella sampling runs with different equilibrium values Q<sup>0</sup> is performed. In particular, the interval between different Q<sup>0</sup> is 4, hence, in the case of the MBP, 184 different runs are used to reconstruct the profile Q<sup>0</sup> 368, 364,..., 364, 368. We find it convenient to assign different values to the elastic constant k<sup>u</sup> of the harmonic umbrella potential depending on Q0, in order to better control simulation outcomes. In particular, such a shrewdness allow us to explore the whole region of the phase space associated with the reaction coordinate, maintaining at the same time a good overlap between adjacent histograms and a limited amount of sampling windows. For the sake of completeness, k<sup>u</sup> 0.22 2 for any Q0, where the greater values are used for the higher values of Q<sup>0</sup> . Indeed, higher values of Q<sup>0</sup> correspond to the regions of the phase space where it is easier to lose the protein in the bulk even for very small fluctuations of the reaction coordinate from the equilibrium value, i.e. the regions in which the molecule is close to the pore mouths. Initial configurations for the umbrella sampling runs are borrowed from the configurations explored during translocation simulations by selecting the ones closer to the Q0-value employed in the issued sampling. Each simulation runs for a time suited to collect proper uncorrelated statistics (1000 points for each histogram with a correlation time equal to 150 internal units; the latter value was determined by preliminary simulations). Moreover, the first

10<sup>4</sup> time units (equal to 10% of the translocation simulation time window) are not used for histogram collection, in order to allow the initial configuration to lose memory of the transport process and to foster the possible refolding at the trans-side. Multiple weighted histogram analysis method is applied to deweight the data arising from the procedure in order to obtain the free energy profile G Q . Further details on specific simulation protocols and employed quantities, such as the contact backward density, are given in section .

### **.Results**

In this section we show that the translocation of proteins across narrow pores is generally characterized by the occurrence of long stall events, which main features are essentially retained in a D description of the process involving the evolution of a proper collective variable, the one in equation (.), in the associated free energy landscape. This D approach is particularly appropriate when the translocation proceeds slowly, that is when the critical driving force is low enough. Indeed, the free energy profile is just a collection of thermodynamics states along the translocation pathway, which might not be in accordance with the non-equilibrium configurations explored during the transport.

In particular it is pointed out that both the ramps along the free energy trends and the translocation bottlenecks result to be strongly associated with the structural properties of the folded proteins. In fact, they correspond systematically to the regions along the backbone chain with a dense distribution of long-range native contacts with the untranslocated portion of the protein, the socalled backward contacts. These contacts, as we shall see, constitute a refinement of what already introduced in section . In other words, the backward contact density along the protein structure reflects into a step-like structure of the free energy, besides determining the stalls of the transport.

The D Smoluchowski/Langevin approach retains the essence of the translocation phenomenon, although some unavoidable limitations, due to the simplifications introduced by the D modeling, can arise, especially if the slip transport phases in between successive stalls in the D molecular dynamics (MD) simulations are very fast due to a high critical force needed to perform the passage.

The overall emerging picture suggests to decompose the free energy landscape into its entropic and energetic terms. This approach confirms the importance of the regions with a dense distribution of long range interactions, which govern both internal energy and entropy trends. In fact, once such bonds are broken, we recognize not only an increment in the average value of the internal energy, but also a similar jump in the entropic term. The latter one indicates that the unfolding associated with contact severance prevails upon the entropy reduction due to the pore confinement. Such a behavior is different from the one relevant to an unfolded protein, as the latter shows a decreasing (or stationary) entropic contribution trend.

Numerical simulations involving different proteins support the conjecture that these results are not limited to the MBP specific case, but are instead broad. For the sake of clarity, the approach is first presented referring to the MBP. Successively, the generality of the results is discussed by addressing the specific MBP mutant MalE and the globular protein AH, which has been selected for its substantially different native topology.

Figure .: Here and in the following figures units are internal, i.e. related to the coarsegrained model we used. Distances, when present, are in Å. Panel A: Nleft, Nright and Npore are the number of monomers on the left side, right side and inside the pore. It is always conventionally assumed that the first monomer entering the left side is the Cterminus (dark gray) while the translocation from the right side are pulled from the Nterminus (light gray). Hence the protein translocates left-to-right or right-to-left for Cand N-pulling, respectively. Panel B: Free-energy (G) profile from umbrella sampling simulations of MBP in the collective variable Q Nright Nleft. The curves represent the average residence time τ<sup>b</sup> Q for non-equilibrium MD simulations (filled) and for the D approach based on the Smoluchowski equation (.) (dashed). For N-pull the τ<sup>b</sup> Q scale is reversed (the values being reported in parentheses and in light gray on the right y-axes) while τ<sup>b</sup> Q stemming from the Smoluchowski equation are rescaled in order to match the background values.

The average residence time τ<sup>b</sup> Q that the MBP (R<sup>c</sup> 7.5Å in the present investigation) spends in the configurations associated with Q during the translocation (Fig. .B), clearly indicates that the transport is not uniform. Indeed, most time is spent in the same specific states found in chapter for R<sup>c</sup> 6.8Å (events C-st and C-st for C-pulling and N-st, N-st, N-st for N-pulling). In Figure .B the free-energy profile G Q is also reported: the barriers of G Q correspond to the stalling events. To obtain the profile we employ the procedure described in section by using as initial configurations both the conformations relative to the C and N-pulling translocation runs. In fact, from the MBP-G Q profile obtained by using as initial configurations only those from C-pull runs and reported in dark gray in the upper left panel of Figure ., it is clear that the average ascending trend in the region Q 0 does not correspond to an analogous average descending trend for Q 0. Since outside the pore (Q 370 and Q 370) the thermodynamics conditions are the same for both sides of the channel, the relative free energy values should be equal. Such a prediction is not reproduced by a profile achieved by using only initial configurations which belong to either C or N-terminus translocation runs (single profile). This is presumably due to the long time needed for the refolding process at the trans side to take place. Indeed, the same behavior is confirmed for the MBP N-pull runs (light gray curve in the middle panel of Fig. . left side): for Q 0 (we conventionally consider the N-terminus pulled translocations to proceed from right to left) the ascending trend overwhelms the decreasing one observed when Q 0. In order to obtain a single representative G Q profile for the entire process, half

Figure .: Free energy profiles G Q obtained from umbrella sampling runs employing initial configuration taken only from C-pull (dark gray) or N-pull (light gray) translocation runs. The black line in the lower panel is the profile obtained using the two previous profiles half by half. The left panels refer to MBP and the right ones to AH.

of the umbrella sampling runs obtained from the C-terminus pulling initial conditions (Q<sup>0</sup> 0) are combined with the complementary simulations relevant to the N-terminus case (Q<sup>0</sup> 0). The resulting profile is plotted in the lower panel in Figure . and it is the one reported in Figure .B. The right panels of the former figure refer to AH, for which the same approach holds. As a further indication that the anomalies of the single profiles are associated with the time needed for the refolding process, we report in Figure . the free energy profile for an unfolded protein, R<sup>c</sup> 3.0Å obtained from umbrella sampling simulations exploiting just the C-pull run initial configurations. In this case G 0 also for Q 370.

Figure .: G Q profile for denatured MBP protein (R<sup>c</sup> 3Å) obtained from umbrella sampling employing initial configuration from C-pull translocation runs only.

As common in translocation problems, the free energy trend is used in the practical implementation of the Smoluchowski equation in order to compare the non-equilibrium MD runs with the D continuum approach. In this perspective, a suitable mapping of the importing force F<sup>x</sup> employed in the MD runs on the effective force F<sup>Q</sup> ∂W ∂Q used in the stochastic model is required. A gross estimate for F<sup>Q</sup> can be obtained by assuming that, in the MD case, each position of the application point of the force along the x-axis corresponds to a unique value of the collective variable Q. In principle, from translocation experiments, it is possible to extract the conditional probability of observing Q as a function of the force application point x, Pˆ Q x . It turns out that Pˆ Q x is sharply peaked, thus allowing to use a (monotonically increasing) deterministic function to map the MD onto the D-approach forces by using F<sup>Q</sup> ∂W ∂x ∂x ∂Q Fx∂x ∂Q. Indeed the molecular dynamics data show that when the pore is full (Nlef t 0 and Nright 0) the distribution of Npore is tight around its average value, N¯pore 38 (Fig. .) upper panels. Hence, a single monomer entering from the cis side corresponds to a single monomer exiting from the trans side (single-file motion). The net variation of Q, ∆Q 2, is associated to a displacement of the force application point substantially given by ∆x L N¯pore 2.5 . Thus it follows that F<sup>Q</sup> Fx∆x ∆Q 1.25Fx. Instead, when either Nlef t 0 or Nright 0, we get ∆Q 1 and F<sup>Q</sup> Fx∂x ∂Q 2.5Fx.

The dashed lines in Figure .B represent the average residence time τ<sup>b</sup> Q obtained by solving a set of D Langevin runs governed by (.). It is clear that the peaks coincide with the stalls observed in the MD runs, thus confirming the accuracy of the D driven diffusion interpretation.

Figure .: Upper left. Npore distribution for translocation runs (both C and N pulling) for the MBP. Upper right, same data for AH. Lower Left. Scatter plot of the amino acid at the pore mouth on the cis-side as a function of the reaction coordinate Q during MBP translocation simulations (C-pull dark gray, N-pull light gray). The maps eqs. (.) and (.) are also depicted as straight dotted lines. Lower Right. Same data for the AH translocations.

It has just been shown that the Smoluchowski picture is appropriate to describe the essence of the translocation process. Now we show also that the main features of both G Q and τ<sup>b</sup> Q are associated with specific properties of the molecule native structure. In particular, we observe that the stalling events are related to the entrance in the pore of the regions richer in long range contacts formed with the untranslocated portion of the protein. The main idea behind this approach is that only the interactions between the parts of the molecule that still dwell on the cis-side are effective in provoking the stalls. As a first attempt to quantify such a remark, we calculate for each amino acid the backward contact index B, i.e. the number of contacts that the issued amino acid forms with the still untranslocated residues. First of all it is necessary to establish a map between Q and the amino acid at the pore mouth, in order to express directly the backward contact density as a function of Q, alike τ<sup>b</sup> and G. This view allows comparing the native contact clusters with the features of the free-energy profile, even without resorting to a dynamical analysis of the MD data. However, the average number of residues inside the pore, when it is full, is required to be, with a good approximation, a constant of the motion. In Figure . a scatter plot of Q versus the amino acid at the cis pore mouth, iCIS , is also shown in the Coarse-grained molecular dynamics and continuum models

lower panels. Such an amino acid is selected by running back upon the chain of the protein, starting from the foremost residue along the pore axis up to the first residue which x-coordinate falls in the range 1 3 . Also, its distance from the pore axis has to be lower than the channel radius. For each Q the distribution of the amino acids that are found at the pore mouth is very tight (the map is practically one to one and onto). The dashed lines show such a correspondence by assuming N¯pore 38 and by using (.), or (.) to map Q and iCIS. The map for the C-pulling case is

$$\begin{array}{l} i\_{CIS} \in \left[1: \bar{N}\_{pore}\right] \quad Q = i\_{CIS} - m, \\ i\_{CIS} \in \left[\bar{N}\_{pore}: m\right] \quad Q = 2i\_{CIS} - \bar{N}\_{pore} - m, \end{array} \tag{4.8}$$

while the N-pulling map reads as

$$\begin{array}{l} i\_{CIS} \in \left[ m: m - \bar{N}\_{pore} \right] \quad Q = -i\_{CIS}, \\ i\_{CIS} \in \left( m - \bar{N}\_{pore}: 1 \right] \quad Q = m - \bar{N}\_{pore} - 2i\_{CIS}. \end{array} \tag{4.9}$$

Such a correspondence is mainly used to map the backward contact index relative to the amino acid at the pore mouth onto the reaction coordinate Q, by assuming that N¯pore is a constant of the translocation dynamics.

For a protein pulled from the N-terminus (first residue), the backward contact index for the i-th amino acid reads

$$B\_i^{\{N\}} = \sum\_{j=i+b\_t}^{m} M\_{ij},\tag{4.10}$$

where Mij is the contact matrix (Mij 1 if residues i and j are in native contact and 0 elsewhere) and b<sup>t</sup> a threshold introduced in order to consider only the native contacts formed with the residues that are relatively far in the sequence. Here we use b<sup>t</sup> 20, a distal distance that excludes, for instance, the interactions responsible for helix structures. Indeed, such structures do not play a considerable rôle since helices are not required to completely unfold to enter the pore. A smoothing procedure (running average) is then performed in the interval i 5 i 5 . The resulting function is indicated as B˜ <sup>N</sup> <sup>i</sup> . Such a procedure is a refinement of what already illustrated in section as, on one hand, only the backward distal contacts are considered and, on the other hand, no dynamical data are required aside from a preliminary evaluation of N¯pore. An analogous expression holds for C-pulling:

$$B\_i^{\{C\}} = \sum\_{j=1}^{i \blacksquare b\_t} M\_{ij}. \tag{4.11}$$

Of course, when the amino acid i is supposed to be at the pore mouth, we get i iCIS and the maps (.) or (.) between iCIS and Q hold. In Figure .

the smoothed version B˜ of the backward contact index B is reported for C and N-pull from just a static analysis of the MBP contact map. It is noticeable that the ascending ramps in the profile correspond with the peaks in the backward contact density B˜.

In order to confirm this picture, we remove selectively the native contacts own of the amino acids and (two residues mainly involved in the stall C-st) and repeat both the non-equilibrium MD translocations and the entire umbrella sampling protocol. The result is that the residence time peak C-st is smaller (Fig..) and, correspondingly, the G Q slope in that region is lowered (Fig. .).

Figure .: Burial backward <sup>B</sup>˜ for C (dark gray) and N-pull (light gray). The solid line is the free energy profile of the MBP (already reported in Fig. .B) while the dashed one is the profile of the MBP-mutant, for which the native contacts of amino acids and are removed.

Figure .: Comparison of residence time τ<sup>b</sup> as a function of the collective variable Q for wild-type MBP (dashed lines) and MalE (solid lines) for both C (dark gray) and N (light gray) pulling.

To understand if the triple correspondence between average residence time τ<sup>b</sup> Q from non-equilibrium runs, free energy profile G Q and protein structural features (i.e. the backward contact density B˜) is specific of the MBP or is a more widespread property of protein-like structures, we repeat the analysis for a different globular molecule (AH) characterized by a substantially different native topology with respect to the MBP (see Fig. . for a comparison between contact maps; R<sup>c</sup> 7.5Å for the MBP and R<sup>c</sup> 7.2Å for AH). In particular, native contacts are more uniformly distributed for the MBP and are mainly associated with either distal clusters or β-sheet interactions. In contrast, for the AH molecule, Lennard-Jones potentials concern primarily adjacent α-helices, in addition to a small set of clusters that the region close to the N-terminus forms with the rest of the structure. In the bottom panel of Figure . the quenched contacts in MalE are highlighted in dark gray. The protein AH is formed

Figure .: Native contacts are reported in transparent light gray circles. The dark gray circles in the bottom panel correspond to the fourteen contacts removed in order to obtain the mutated Maltose Binding Protein, MalE.

by residues (hence Q 300, 300 ). In Figure ., G, τ<sup>b</sup> and B˜ for both C (dark gray) and N-pull (light gray) are reported as a function of Q. The dynamics, similarly to the MBP, is characterized by several stalls. For both N-pull and C-pull it appears that the peaks in τ<sup>b</sup> Q coincide with the ones in the corresponding B˜ trend. Moreover, the larger clusters of native contacts are associated with the barriers in G Q .

Here the Smoluchowski picture (dashed lines) is appropriate to summarize the D phenomenology, mainly for the C-pulling case, as the N-pull is characterized by a very high value of the critical force. Such a force probably reduces the suitability of the D approach in the equilibrium potential dictated by the free energy profile, at least from a quantitative point of view. In Table . critical forces F<sup>c</sup> are summarized for the translocation simulations performed. For the mutated MBP the critical force can be assumed to be equal to the one pertinent to the wild type structure.

Figure .: AH. The filled curves in the main panel represent the average residence time τ<sup>b</sup> from D MD translocation runs, while the dashed lines represent the outcome of the D Smoluchowski approach. The solid line is the free energy profile G Q from umbrella sampling simulations. The upper and lower panels report the backward contact index <sup>B</sup>˜ for C (dark gray) and N-pull (light gray).


Table .: Critical forces for the different MD translocation runs performed. The critical force of the mutated MBP coincides, within the statistical error, with the critical force of the corresponding wild type structure.

Finally, we analyze the different contributions to G Q in order to understand what are the dominant ones for the observed phenomenology. In Figure .A, the total internal energy VGo¯, the free energy G and the entropic contribution T S are reported for the wild type MBP as a function of Q. The energetic contributions are calculated as conditional average by using the configurations from the umbrella sampling runs. The entropic term is T S Q VGo¯ Q G Q . All contributions are conventionally set to zero for Q 370 (protein on the left side of the pore). During the transport VGo¯ increases since the protein needs to unfold to translocate and, consequently, native contacts are stressed up to rupture. A similar behavior is found for the entropy. Indeed the break-up of native contacts results in a less compact structure and in a disorder increment, up to a maximum corresponding to the protein straddling the pore symmetrically (the unfolding associated with contact detachment overwhelms the confinement effect of the channel). Entropic and energetic contributions are

### Coarse-grained molecular dynamics and continuum models

closely correlated one another and, as usual, with the clusters of native contacts. Also, the entropy profile differs from the one pertinent to an unstructured polymer, which is provided in Figure . upper panel (MBP R<sup>c</sup> 3.0Å). As it is expected, the entropic term shows just a barrier at the beginning of the translocation, i.e. up to the exit of the first residue on the trans side, after which the landscape is practically flat. Panel B of Figure . reports the main contributions to VGo¯, namely Vnat and Vφ. The dashed line Vnat marks a heuristic estimation for Vnat, obtained by integrating the balanced version of the backward contact density subsequently described.

Figure .: Upper: Energetic (dark gray circles) and entropic (light gray squares) contributions to the free energy profile G Q for wild type MBP. Lower: Main contributions to VGo¯: Vnat (light gray diamonds) and V<sup>φ</sup> (dark gray triangles). The dashed line represent the heuristic estimation of Vnat based on the backward contact density.

For the sake of clarity, a translocation pulled from the C-terminus is exemplified. A quantitatively identical result can be obtained by considering the N-pull translocation. As first step we calculate the quantity

$$
\tilde{B}^{(C)^\*} (Q) = \tilde{B}^{(C)} (Q(i)) - \tilde{B}^{(N)} (Q(i + \bar{N}\_{pore})),\tag{4.12}
$$

where the term B˜ <sup>N</sup> Q i N¯pore represents the contribution to the internal energy due to the refolding on the trans side. In fact, when residue i dwells at the pore mouth on the cis-side, residue i N¯pore dwells close by the trans-side

pore entrance. Integration of (.) provides an estimation for the net number of broken contacts during the translocation. Since each contact amounts for an energy jump equal to , a rough estimation of the native contribution to the internal energy can be calculated as

$$V\_{nat}^{\*}(Q) = \int\_{\!\!\! \! \! \! / \! \! / \! \! /}^{Q} \tilde{B}^{\{C\}^{\*}}(q) \mathrm{d}q. \tag{4.13}$$

The approach is a further confirmation that the essence of the translocation is hidden in the structural properties of the native configurations, which are in turn mainly determined by the long range native interactions.

To conclude, we mention that analogous results in terms of energetic and entropic contributions hold for the AH protein (Fig. . lower panel), thus confirming the overall picture.

Figure .: Energetic (dark gray circles) and entropic (light gray squares) contributions to the free energy profile G Q for unfolded MBP (R<sup>c</sup> 3Å upper panel) and for AH (lower panel).

#### **.Remarks**

In this chapter it has been show that the stalling events that characterize the dynamics of a protein-like structure in a single-file translocation are associated to the structural properties of the native configuration. The general setting holds for all the cases analyzed. However, the stall sequence is specific for each protein and constitutes a sort of signature of the molecule. The identification of the features responsible for the bottlenecks of the transport allow us to formulate a heuristic procedure able to estimate the native bound contribution to the free energy profile, on the sole basis of the protein native structure and N¯pore, by simply integrating a balanced version of the backward contact index profile. These occurrences appear to be generic, not associated just to a specific globular protein, thus opening the way to systematic pre-screening of the proteome.

Also, the intrinsic difference between folded and unfolded structures in terms of the entropic contribution relevant to the translocation process has been pointed out. In the former case the confinement effect due the pore appears to be overcome by the severance of native contacts, hence confirming the importance and leading role of the long range interactions in this kind of dynamics. On the other hand, for unstructured polymers, the free energy landscape is dominated by a single entropic barrier at the beginning of the transport.

Finally, the one-dimensional interpretation of the process as a driven diffusion along a proper potential of the mean force provides overall good results. Indeed, it retains the essence of the D translocation dynamics like, for example, the bottleneck scenario. However, quantitative comparisons appear to be too ambitious, especially if the critical force pertinent to the transport is high, as for the AH N-terminus pulling. In fact, in this case, once the first stall has been overcome, the translocation proceeds extremely fast and hence it is poorly described by a quasi-static approach.

# **Chapter -Dimensional Continuum Models**

#### **.Preamble**

In the present chapter we propose a continuum model for the description of the dynamics of isolated macromolecules. We adopt a second-rank tensor as a descriptor of the macromolecular shape and identify the action governing its dynamics by means of an identification procedure from a discrete scheme, based on power equivalence and Cauchy-Born rule. We compare molecular dynamics stretching simulations with the continuum model by starting from discrete toy schemes, going on with increasing complexity, and ending with the analysis of the already encountered Ubiquitin protein. The results indicate limitations in the approach in case of unconstrained molecular dynamics while they show appropriateness for driven dynamics.

In the second part of the chapter, simulations relevant to the MBP protein are also presented. In this case stretching and, above all, translocation dynamics are considered. The selected approach is able to capture the shape evolution of the molecule, providing that the pore is long enough to provoke a considerable stretching. However, limitations totally analogous to the Ubiquitin protein (and hence not shown for brevity) still hold for equilibration simulations.

Such shortcomings are ovecome thanks to empirical formulations of the morphogical descriptor, or rather of the quantity that govers its dynamics, as we will see.

#### **. Introduction**

Biological materials have in general intricate architecture at finer spatial scales. Multi-scale and multi-field views are often called upon to offer a setting for the description of their mechanical behavior. Even the analysis of processes involving a single macromolecule requires care. This is the case, for example, of protein stretching in free space (performed, for example, in atomic force microscope experiments). In analyzing such a process, both discrete and continuous views are adopted. Numerical simulations on discrete schemes (both full atoms and coarse-grained molecular dynamics) are developed to analyze stretching and transport (see, e.g., [, , , , ]), and to determine free-energy profiles (see, e.g., [, , , ]) which are used often to obtain closed form solutions for first passage time statistics (see [] for relevant theoretical issues).

Models at continuum level are then used to summarize the results of numerical simulations, by reducing the dynamics of discrete material points to the one of a single descriptor of the molecule. The common view is that of using data to describe just the motion of the molecular mass center or of a similar collective variable leading to one-dimensional dynamics (see, e.g., [, ], and references therein). No information on molecular shape evolution in time are added.

Here we propose a continuum model which captures, at least in a coarse way, the shape evolution of an isolated protein. So, we attribute to the molecule a (time-dependent) second-rank tensor ν with positive determinant as a descriptor of its shape. ν does not coincide with the inertia tensor. It is a linear operator that we define on the basis of Cauchy-Born rule. In fact, we imagine to fix a shape of the molecule that we take as a reference, and represent it by a discrete set of material points. ν is then defined as the tensor such that the velocity of the i th material point is given by the time rate of ν applied to the i th reference position vector.

The evolution of ν is governed by a balance of (microscopic) actions on the molecule. At continuum level we identify such actions in terms of the ones occurring between every material point and its first, second, and third neighbors in the discrete scheme, in addition to possible long-range interactions. The identification is made in terms of power equivalence. The values of the actions in the discrete model are obtained by numerical simulations. Then the continuum evolution of ν is depicted in terms of the eigenvalues of the symmetric component of its polar decomposition.

We compare molecular dynamics stretching simulations with the continuum model. We consider first discrete toy schemes and go on with increasing complexity, then we end with the analysis of the Ubiquitin protein. The results indicate limitations in the approach in case of free dynamics of the macromolecule while they show appropriateness for constrained dynamics, allowing us to accept the model for a number of constrained motions such as protein mechanical unfolding.

Finally, MBP translocation is also taken into account (there is not substantial difference between Ubiquitin and MBP thermalization and mechanical stretching results, hence MBP data are not reported for brevity, aside from just one remark in sec. .). We shown that the appropriateness of the approcah still holds in MBP driven and confined dynamics. At the end of the chapter, we also point out that the limitations concerning free equilibration are overcome by selecting empirical formulations of the so-called self-action (a quantity that, in the present setting, governs the evolution of ν).

# **.Models and methods**

# **..Discrete scheme**

The Ubiquitin protein in Figure . is described by the already introduced (sec. . of chapter ) C<sup>α</sup> backbone coarse-grained Go-like model (see [ ¯ , ] for the original approach, [] for the refined version implemented here and also [, , , ], for its use in describing protein stretching, thermalization and translocation). The scheme is also equivalent to what is presented in [] where the same molecule (PDB-id:ubq) is considered. Values of the parameters appearing in protein potentials and measure scales are reported also in []. If not

Figure .: From left to right: full-atom picture, secondary structure features and coarsegrained C<sup>α</sup> carbon atoms along the backbone chain.

differently specified, we assume also R<sup>c</sup> 3.0A˚or R<sup>c</sup> 6.5A˚as values of the Go model cut-o ¯ ff radius. The Go potentials (see sec. ¯ . of chapter ) constitute a basis for modeling atomic interactions not only in the Ubiquitin structure but also in the other sets of atoms in space, as it will be pointed out in what follows when necessary. The same approach holds also for the MBP.

# **..Continuum model**

As anticipated in the introduction, at continuum level the dynamics of a single protein is commonly depicted just by evaluating the subsequent positions in time of its centre of mass (see e.g. []). A more refined natural descriptor of the molecular morphology is the moment of inertia of the atomic cluster. When we use it as descriptor of the molecule, as Ericksen suggested in [], its time evolution involves the sum of the tensor moment of momentum of each single atom constituting it.

We do not follow such a view, but we are inspired by it and select a secondrank tensor as a descriptor of the molecular shape, a tensor associated with an idea of homogeneous deformation of a region containing the molecule in a shape that we take as a reference (in this sense we are closer to what suggested in [] for the mechanics of granular media). So, we select a compact region e in threedimensional space R<sup>3</sup> – the volume of e is indicated by e – as a large enough portion of space containing the molecule in a shape that we take as reference (the equivalent sphere radius ranges in the interval [,]A˚, about one and a half to two times greater than the Ubiquitin gyration radius). Let r<sup>i</sup> be the vector position of the i th atom – or material point if we refer to a coarse discrete representation of the protein – in e in the reference place. We consider a secondrank tensor ν with positive determinant, which depends on time t, such that at the instant t the velocity v<sup>i</sup> of the i th material point in the discrete scheme is simply given by ν˙ r<sup>i</sup> – essentially we are imagining that e is endowed with an homogeneous strain rate and the atoms of the molecules are dragged along the process by the homogeneous deformation.

The assumption has to be interpreted 'in average'. It has clear limitations because the unfolding of a protein can develop in a non-homogeneous way by involving subsequent portions. However, the assumption

$$\mathbf{w}\_{i} = \dot{\nu}r\_{i},\tag{5.1}$$

which is motivated by Cauchy-Born rule , allows us to identify clearly the action of the protein over itself.

We presume also that ν be endowed with positive determinant, so that, by polar decomposition theorem, we can write

$$\nu = RU,\tag{5.2}$$

where R SO 3 and U is a second-rank symmetric tensor.

In the discrete representation of the macromolecule, as described previously, call fij the action (a force, a covector indeed, that we think derived from a potential, the one introduced previously) between the material points i and j, applied to the point i. The density of power πdiscr developed in the discrete scheme is then given by

$$
\pi\_{discr} \coloneqq \frac{1}{|\mathbf{e}|} \sum\_{i \neq j \in \mathbf{e}} f\_{ij} \cdot \mathbf{v}\_i \cdot \mathbf{e}
$$

In the continuum representation, the rate ν˙ has a conjugated action which is defined by the power π<sup>c</sup> which is necessary to develop over the molecule, considered as a whole, to get the rate ν˙ itself. Since ν˙ is a tensor, the power-conjugated action is a tensor too – call it z¯<sup>s</sup> – so that π<sup>c</sup> is defined by

$$
\pi\_c \coloneqq \bar{z}\_s \cdot \dot{\nu}.
$$

 See [] and [] for extended analyses of such a rule.

Moreover, since z¯<sup>s</sup> summarizes the actions in the protein (it is a self-action, indeed), and we are not considering for the moment other actions over it, interaction with the external environment being assigned, in this case, by applied strain to the ends of the protein, z<sup>s</sup> should satisfy the balance equation

$$
\bar{z}\_s = 0.\tag{5.3}
$$

However, in principle z¯<sup>s</sup> could be imagined as the sum of a conservative part, zs, and a dissipative one, z<sup>d</sup> <sup>s</sup> , dissipation being developed during the evolution of the molecule – so we should have z¯<sup>s</sup> z<sup>s</sup> z<sup>d</sup> <sup>s</sup> . The dissipative nature of z<sup>d</sup> s would be then expressed by the inequality

$$z\_s^d \bullet \dot{\nu} \ge 0,$$

which should hold for any choice of ν˙, implying so an expression for z<sup>d</sup> <sup>s</sup> of the type

$$z\_s^d = a\dot{\nu}^\flat,\tag{5.4}$$

with a a positive constant and ν˙ the dual version of ν˙. Precisely, with e1, e2, e<sup>3</sup> a vector basis in the three-dimensional ambient space, and e<sup>1</sup>, e<sup>2</sup>, e<sup>3</sup> its dual counterpart, we have ν˙ ν˙ ij e<sup>i</sup> e<sup>j</sup> and ν˙ ν˙ij e<sup>i</sup> e<sup>j</sup> . The reason of the occurrence of ν˙ in (.) is that this way the product aν˙ ν˙ aν˙ijν˙ ij is naturally defined without the additional structure of the scalar product. However, in what follows we shall not make difference between covariant and contravariant components because we shall always tacitly use the identification of R<sup>3</sup> with its dual. As a consequence the balance (.) would reduce to the evolution equation

$$a\dot{\nu} = -z\_s.$$

In particular, for the sake of simplicity we shall assume later that a 1, so that the evolution equation that we shall consider will be

$$
\dot{\nu} = -z\_s.\tag{5.5}
$$

If so, the power π<sup>c</sup> would result the sum

$$
\pi\_c = z\_s \cdot \dot{\nu} + \frac{1}{2} \left| \dot{\nu} \right|^2 \dots
$$

The problem is now the explicit expression of z<sup>s</sup> in terms of the discrete actions fij . To get it, we impose here the identification of πdiscr with z<sup>s</sup> ν˙: the power of the (potential dependent) actions in the discrete scheme should be the one of the conservative part of the self-action. We then write

$$z\_s \cdot \dot{\nu} = \frac{1}{|\mathbf{c}|} \sum\_{i \neq j \in \mathbf{c}} f\_{ij} \cdot \mathbf{v}\_i \cdot \mathbf{s}$$

Then, thanks to the arbitrariness of ν˙, by using (.) we finally get

$$z\_s = \frac{1}{|\mathfrak{e}|} \sum\_{i \neq j \in \mathfrak{e}} f\_{ij} \otimes r\_i. \tag{5.6}$$

In presence of (conservative) external forces on single material points of the discrete representation of the molecule, the previous expression of z<sup>s</sup> should be augmented by the tensor product of these forces by the position vectors of the material points where they are applied. So, if ¯f<sup>k</sup> is the (conservative) force applied to the k th material point, with k 1, 2, ..., q, with q the number of atoms where external action act, we then have

$$z\_s = \frac{1}{|\mathfrak{e}|} \left( \sum\_{i \neq j \in \mathfrak{e}} f\_{ij} \otimes r\_i + \sum\_{k=1}^q \bar{f}\_k \otimes r\_k \right).$$

Since we are interested in the changes in molecular shape, in the numerical analyses presented later we shall compute essentially the eigenvalues λ of the symmetric tensor U appearing in the polar decomposition of ν, and determining the semi-axes of an ellipsoid in three-dimensional space.

Finally, in what follows, the placement vectors r<sup>i</sup> might be referred to different frames. However, in the case of Ubiquitin and Maltose Binding proteins, only placement vectors with respect to the center of mass of the molecules will be considered. Obviously, explicit coordinates will depend on the frame selected to evaluate both the center of mass and bead positions.

#### **.. Stretching simulations: methods**

We summarize here the methods that we use in developing later numerical simulations.

Stretching simulations are performed by means of a scheme based on constant velocity steered molecular dynamics (see []). Initial conditions are obtained by the so-called 'free equilibrium'. At the beginning of each numerical test the tensor ν is set to be equal to the identity. Protein detachment is achieved by a spring of elastic constant k 1ϵ, linking a fictitious atom with a terminus of the molecule. The fictitious atom is pulled at constant velocity. Its motion drags the terminus where it is bonded. We perform two different stretching protocols.


For an enlarged view on this identification procedure in the setting of mechanics of complex materials, see [].

The stretching direction coincides with the vector joining the two ends of the molecule in the initial configuration. We investigate different stretching velocities vstr, temperatures T and cut-off radii Rc. When maximum elongation is reached (the molecule is in an almost single-file conformation), the protein can be either kept stretched or left free to fluctuate.

For the other smaller sets of atoms, either the fictitious atom external spring (the one connecting the fictitious material point to the molecule) or the direct enforcement of a suited constant velocity motion to some points of the system have been used as strategies to obtain a deformation dynamics.

#### **..Translocation simulations: methods**

As far as MBP translocation simulations are concerned, nothing changes from what described in sec. . but the pore length, which ranges in the interval 100 800 A˚, depending on the simulation perfomed.

#### **.Results**

#### **..Toy simulations and related remarks**

Before looking at the dynamics of the entire Ubiquitin molecule at discrete scale and within the continuum approximation proposed, we analyze some simpler schemes – toys, indeed – to investigate the limits of the discrete-to-continuum identification that we have adopted here.

**D biatomic systems.** First we consider the evolution of ν for a trivial harmonic oscillator, composed by two material points, indicated by A and B, and connected by a spring of elastic constant kh. In this case ν, defined previously, reduces to a scalar. The scheme, sketched in panel a of Figure ., allows us to make some elementary predictions that are in agreement with the numerical analyses. The simulation temperature is set equal to zero and constant velocities are assigned to the two atoms.

Let r<sup>0</sup> be half of the inter-atomic distance when the spring between points A and B is at rest (in other words, in a configuration that we set as a reference), that is r<sup>A</sup> t 0 r<sup>0</sup> and the same is for B.

When two equal and opposite velocities v<sup>0</sup> t are assigned to points A and B, for the motion we get

$$r\_A(t) = -r\_0 - \int\_{t\_0}^t v\_0(\tau)(\tau - t\_0)d\tau,\ \ r\_B(t) = -r\_A(t).$$

### Coarse-grained molecular dynamics and continuum models

Figure .: Two-atoms system. Here and in all the other figures lengths are expressed in Å and the time in the relevant 'code unit'. Panel a: Sketch of the two-atoms mass-spring system. Panel b: Evolution of ν, a scalar in this case, for a standard elongation/compression cycle, run twice. The asymmetric evolution of ν would generate an almost-monotonic trend if the cycle was performed over and over again. The evolution of ν is here compared with the radius of gyration rgir of the set of atoms. Panel c: It is the same of panel b except that here the specular evolution of ν is achieved because internal distances r<sup>G</sup> <sup>i</sup> appearing in the expression of z<sup>s</sup> are kept constant to the reference values during the whole dynamics. We note a delay in the evolution of ν with respect to the trend of the radius of gyration as the former keeps on increasing (or decreasing) during the approaching phases (i.e. when the system is moving toward the rest position from maximum detachment or compression). Also, ν never shrinks further than its initial state. Panel d: Effect of the velocity on the evolution of ν. Despite an overall equivalent deformation testified by the evolution of the gyration radii in the inset, the amplitude of the trends of the ellipsoid is different if the velocity is changed, i.e. it is dependent on the rate of deformation.

By indicating with fAB the force that point A exerts on point B, we can write

$$f\_{AB} = -k\_h(r\_B(t) - r\_A(t) - 2r\_0), \quad f\_{BA} = -f\_{AB}.$$

We develop an elongation cycle in which points A and B are pulled far apart by imposing opposite constant velocities v<sup>0</sup> to A and B. The initial position with the atoms at rest is obviously regained by reversing the velocities for the same amount of time spent in the elongation phase, i.e. t<sup>e</sup> ta, where the index e indicates the elongation time and a the time spent in the re-approaching process. The analysis of the previous equations is trivial and the solution can be plugged into the definition of z<sup>s</sup> (where for simplicity we take e 1).

Assume t<sup>0</sup> 0. From the coarse grained balance of overall actions in the continuum scheme, we find two expressions for z<sup>s</sup> that we label with the apex e in the elongation phase, and with a when the two particles re-approach one another by reverse velocity. We get

$$z\_s^e = -4k\_h \left( r\_0 v\_0 t + v\_0^2 t^2 \right), \quad z\_s^a = -2k\_h \left( v\_0 t - 2r\_0 \right)^2.$$

Time integration of (.) along an extension-compression cycle starting from an initial value v<sup>0</sup> then leads to

$$\begin{aligned} \nu\_e &= \nu\_0 + \int\_0^{t\_e} 4k\_h v\_0 \{r\_0 \tau + v\_0 \tau^2\} d\tau \\ &= \nu\_0 + 4k\_h \{v\_0 r\_0 \frac{t\_e^2}{2} + \frac{1}{3} v\_0^2 t\_e^3\} = \frac{10}{3} k\_h t\_e r\_0^2, \\ \nu\_a &= \nu\_e + \int\_{t\_e}^{t\_a} 2k\_h \{v\_0 \tau - 2r\_0\}^2 d\tau > \nu\_e. \end{aligned}$$

The last inequality arises for z<sup>a</sup> <sup>s</sup> is negative in the whole approaching process. So, we have a drawback: the evolution of ν is monotonic although in the approaching phase the two atoms are getting closer toward the initial configuration. Also, if we we start compressing, the effect on ν is in magnitude different from the extension dynamics. Indeed the expression of z<sup>s</sup> for such a case is

$$z\_s = 2k\_h r\_0 v\_0 t - 2k v\_0^2 t$$

The two terms on the right hand side are opposite in sign: the evolution of ν results smoother and slower than in the elongation phase. In other words, a symmetric deformation process does not generate a symmetric evolution of ν. First ν does not recover the initial condition once the two atoms are back to the rest position. It also does not regain the starting value even when a compression follows the initial elongation, as clearly depicted in panel b in Figure .. In other words, ν never 'shrinks' below the initial value. The evolution process quickly degenerates in an almost monotonic trend, if the cycle elongation-contraction is repeated several times. Moreover, if opposite sinusoidal motions are imposed to points A and B, in place of the constant velocity drifts, a complete monotonic evolution of ν arises, as it results from the subsequent elementary analysis.

In the sinusoidal case, the harmonic displacements (constants do not have here a prominent rôle) are given by

$$r\_B(t) = r\_0 + D \sin(D^\prime (t - t\_0)), \quad r\_A(t) = -r\_B(t)$$

The forces exerted between the points are then

$$f\_{AB}(t) = -k\_h(r\_B(t) - r\_A(t) - 2r\_0) = -2k\_h r\_B(t)$$

and

$$f\_{BA}(t) = -f\_{AB}(t).$$

z<sup>s</sup> is given by

$$r\_s(t) = f\_{AB}(t)r\_B(t) + f\_{BA}(t)r\_A(t) = 2f\_{AB}(t)r\_B(t) \approx -\sin^2(t).$$

By integrating in time (.), we find that ν diverges, namely

$$\nu(t) \propto \int\_0^t \sin^2(\tau) \mathrm{d}\tau \propto t,$$

thus we have again a bounded oscillation for the material points on one hand, and an unbounded evolution on the other hand, if t .

An attempt to eliminate this inadequacy can be the decision to fix r<sup>A</sup> t r<sup>0</sup> and r<sup>B</sup> t r0, an internal constraint, indeed. In this case, in the simple D harmonic case that we treat here, the evolution of ν would result just governed by the forces determined by the internal spring, and it would be bounded and symmetric. But there would still exist an half-a-period delay between the actual overall shape of the system and the associated trend of ν at the continuum level. Namely, the evolution would be still monotonic during the elongation and contraction phases, obviously for the sign of the internal forces does not change (see panel c in Fig. .).

From now on, all results will refer to the just described approach, namely the choice r<sup>i</sup> t r<sup>i</sup> 0 , i 1, ..., m, with m the number of material points in the cluster, if not differently stated. However, in section . we provide also an example of monotonic evolution of ν for the Ubiquitin protein in the case this assumption does not hold, i.e. r<sup>i</sup> r<sup>i</sup> t .

An additional shortcoming suggested by this simple D analysis arises directly from the combination of equations (.) and (.). If we suppose that the same evolution is performed in different times (e.g. we perform a whole extension-contraction cycle once at velocity v<sup>0</sup> and once at velocity v<sup>0</sup> 2), although the same gross deformation occurs, different values of ν are detected. In fact, in the slower case the same trend of z<sup>s</sup> spreads over a time that is twice the time of the fastest one. Hence the integral results greater (doubled if the selfaction evolution is linear), as pointed out in panel d of Figure ..

Finally, it is obvious that if the biatomic D system considered in this section is kept elongated or compressed, its shape does not vary but, since the spring is loaded, z<sup>s</sup> is different from zero and ν evolves independently (data are not reported here).

If the interatomic potential among material points is changed to, for example, a Lennard-Jones one, as the one in (.), we do not find an improvement in the picture. Actually the very stiff Lennard-Jones potential makes it almost impossible to achieve compression without huge spikes in the evolution of ν (see panel a in Fig. .). Therefore, to perform such an analysis, two extra-particles

Figure .: Two-atoms system. Panel a: Trend of ν when the interaction between neighboring atoms is governed by a Lennard-Jones potential. The latter one is very stiff and the associated ν undergoes a very large peak in correspondence with the maximum compression. The decreasing trend is practically negligible. Panel b: Evolution of ν when external forces, due to the interaction between the system and the fictitious atoms that lead the deformation process, is considered in the computation of the self-action. Panel c: Dynamics of ν when, as above, the external forces are taken into account and the square-root of the reciprocal of the square root of the order parameter ν <sup>1</sup> <sup>2</sup> is the quantity chosen to describe the shape evolution of the system. Here the inner interaction potential is an harmonic covalent bond, but a similar outcome is detected also in case a stiff Lennard-Jones potential is enforced. Qualitative and quantitative agreements between ν and the radius of gyration of the D system are achieved.

are bonded through an harmonic potential to the atoms constituting the system. These fictitious atoms are driven at constant velocity instead of directly drifting the two inner points. In this approach we can either include or neglect the contribution of the external springs on the forces that act upon the inner atoms of the systems. Such a choice appears crucial as the external harmonic potentials end up governing the dynamics, and the related forces combined with the internal ones do practically change sign as soon as the motion is reversed (see panel b in Fig. .). This is confirmed not only for Lennard-Jones potentials but also when the inner forces are in turn harmonic potentials (see panel c, same figure). In this analysis (differently from panel b) we take the square-root of the reciprocal of ν, in analogy with the eigenvalues of the inertia matrix and the relevant inertia ellipsoid. This shrewdness allow us to achieve a trend in compliance with the radius of gyration and to avoid the amplitude mismatch observed in panel b. The order of magnitude of the quantities in comparison is also similar.

In this D example there is no influence of temperature on the scheme we are using, providing that the effect of the external springs are not accounted for in the force vectors determining zs. Once the latter is included to achieve a clean representation of the numerical results, as the one in the just mentioned panel c, the toy scheme seems much more sensible to thermal fluctuations and to their ability to spoil the picture, due to the origin of non-homogeneous deformation terms. However low-enough values of the temperature do not remarkably alter stretching results.

Figure .: First row: Dynamics of a homogeneous deformation for a four-atoms system. Each material point is pulled at constant velocity in an extension/compression cycle. Second row of pictures: Evolution of the eigenvalues λ<sup>i</sup> of the symmetric part of ν obtained by polar decomposition. The order parameter for the shape of the system never shrinks below the initial state despite the set of atoms undergoes a compression phase (third column from the left). This shortcoming is common to all the situations where external forces are absent (this is due to the direct assignment of constant velocity motions to the points of the system) or not considered in the computation of the self-action. Obviously, since the deformation process is symmetric, ν is a circle.

**D four-atom system.** Here we briefly mention the outcome of D simulations performed over a set of four atoms in space, placed at the corners of a square. Once homogeneous deformation is applied (all atoms are suitably driven by constant velocity motions) the overall basic picture of panel c in Figure . is achieved. In this elementary scheme all adjacent atoms are bonded by harmonic interactions. These ones are the sole actions of the model (see Fig. ., panel a).

In this case, ν reduces to an ellipse, a circle when, starting from a square configuration, the four atoms are driven homogeneously, maintaining the square symmetry. Figure . displays the evolution in space of the atoms in the square lattice and the corresponding coarse representation in terms of ν. The compression phase is not described appropriately because the actual value of ν represents an ellipse never smaller than the one corresponding to the initial condition. Also, at the end of a cycle in the simulation, the system is in a state equivalent to the initial one, while ν does not reach the initial value.

Previous results on the square lattice refer to a scheme where the reference positions of the atoms, i.e. the r<sup>i</sup> in (.), are kept constant to the initial values, and the interacting forces fij change. However, a similar outcome is detected also in the case the former quantities are let free to evolve, but the two eigenvalues

have different evolutions in magnitude. A similar non-symmetric behavior of the eigenvalues of ν occurs also if the number of internal constraints is changed.

Figure .: Panel a: Picture of the four-atoms set where an internal constraint is inserted. Panel b: Evolution of the semi axis of the ellipse for a homogeneous deformation totally equivalent to the one depicted in the first row of images of Fig. .. The presence of the extra spring alters the dynamics of ν, which semi axes λ<sup>1</sup> and λ<sup>2</sup> (eigenvalues of the symmetric matrix U stemming from polar decomposition of ν) follow different trends in spite of the symmetry of the motion.

Indeed, if an extra inner spring is added, as for example the one connecting atoms and in Figure ., panel a, but the constant velocity stretching protocol is not varied (the deformation process depicted in the upper row of panels in Fig. . still holds), we obviously get a different evolution of ν which changes from a circle to an ellipse (see Fig. ., panel b): the two eigenvalues of ν follow, in fact, different dynamics.

So, at least for systems composed by a small number of particles, the expression of z<sup>s</sup> that we have obtained does not allow the description in terms of ν to capture appropriately the behavior in compression. However, if we perform the deformation process by using fictitious points, properly connected by springs to the molecule (see the first panel in Fig. .), and we take into account the external forces generated this way, the results appear, at least qualitatively, acceptable. Also, they are qualitatively independent of the number of internal constraints, as summarized in Figure .. Finally, we investigate how the picture changes if we alter the interaction potentials. We enlarge first the number of atoms in the two-dimensional lattice, increasing them up to , and consider Lennard-Jones interactions for two different values of the cut-off radius (see Fig. .). For a cutoff R<sup>c</sup> 3.0A˚only adjacent residues generate an interaction. At R<sup>c</sup> 6.5A˚all atoms interact one another. In Figure . the system is also depicted in the maximum elongation and compression states, which correspond to labels () and () in panel a of Fig .. Only when the external actions (the ones exerted by the

Figure .: First row: Four-atoms system and fictitious beads driving deformation. The ghost atoms follow constant velocity motions and drift the system deformation thanks to harmonic external potentials. Maximum elongation states are also reported and labeled by () and () also in panel a. The overall shape of the system is here summarized by the magnitude of the edges of the limiting rectangle, bb<sup>x</sup> and bbz. The semi axes of ν should be in agreement with the latter ones. Panel a: Comparison between the trends of the limiting box and the ellipse semi axes, λ <sup>1</sup> <sup>2</sup> <sup>i</sup> (λ<sup>i</sup> are the eigenvalues of the symmetric part of ν by polar decomposition). Here external forces due to the ghost atoms and relative external springs are considered in the computation of zs. Instead, the inner extra spring is absent (neither in the system itself nor, accordingly, in the computation of the self-action). The qualitative agreement is considerable. Panel b: The qualitative agreement does not change if the inner extra spring is added to the system and to the computation of the self-action, differently from what have observed when external forces are not taken into account in the expression of the self-action (Fig..b).

fictitious material points) are included in the computation of zs, the variation in time of ν can be deemed in agreement with the real shape evolution of the lattice (the square-root of the reciprocals of the eigenvalues of the symmetric matrix U are again considered).

#### **..Three-dimensional atomic clusters**

Previous examples have been developed just to clarify the stage and to pave the way to more intricate three-dimensional simulations, which are, at the end, the realistic setting we have to face. Here we report the results obtained from a simple cubic lattice – two extra atoms are placed in the centers of upper and lower faces to simplify the implementation of constant velocity stretching simulations (the scheme is in Fig. .a) – and the protein Ubiquitin (see Fig. .).

Figure .: D eight-atoms system. The smaller cut-off radius used in the simulation is depicted to show that only adjacent points are bonded in this case, similarly to covalent bonding. The external fictitious apparatus and the limiting rectangular box are also easily detectable. Maximum stretching and compression are labeled with marks () and (). The latter ones can be found also in panel a of Fig. . in correspondence with the evolution of ν in these states.

**Cubic lattice.** Simulation data arising from the analysis of the cubic lattice sketched in Figure .a, confirm what has been shown for the two-dimensional systems.


Figure .: D eight-atoms system. Panel a: Comparison between the evolution of the bounding box edges and ν semi-axes. Here the picture refers to a set where short range (R<sup>c</sup> <sup>3</sup>.5A˚) Lennard-Jones potentials connect adjacent atoms. The deformation is enforced by external fictitious beads and springs. Such forces are not accounted for in the computation of zs. The evolution of semi-axes does not summarize alone the shape of the set of points (there is qualitative agreement only in the compression phase ()). Panel b: Once forces due to external springs are considered, the overall picture improves, providing that λ <sup>1</sup> <sup>2</sup> <sup>1</sup> and <sup>λ</sup> <sup>1</sup> <sup>2</sup> <sup>2</sup> are taken as the semi-axes of ν. Panelsc and d: Similar statements hold true if <sup>R</sup><sup>c</sup> is greater (6.5A˚). If external forces are not considered, the evolution of ν degenerates into a D dynamics (only one of the eigenvalues actually evolves), panel c. Again, if we include external forces we recover accuracy and "robustness" in the scheme (panel d).

Figure .: Panel a: Comparison between the first eigenvalue of U in (.) and the evolution of the edge of the cubic set of atoms oriented along the z-axis. Stretching and compression are performed along the latter axis by means of ghost atoms and springs, see Fig. . for snapshots of the system during an equivalent deformation path. There is not a close correlation between the two quantities displayed and a high peak in the evolution of the eigenvalue is generated close to the compression phase, presumably as a consequence of the stiff Lennard-Jones potentials employed in this simulation (R<sup>c</sup> <sup>12</sup>A˚, all atoms interact with each other). Panel b: Trends of second and third eigenvalues of ν emerging in the simulation just described. If λ <sup>1</sup> <sup>2</sup> <sup>1</sup> and <sup>λ</sup> <sup>1</sup> <sup>2</sup> <sup>2</sup> are considered, the qualitative agreement with the edges evolutions is considerable. Unfortunately this approach does not improve the picture of the first eigenvalue. Panel c: The same stretching protocol is applied to the system when short range Lennard-Jones interactions are considered (R<sup>c</sup> <sup>5</sup>.75A˚). Similarly to the D case, short range Lennard-Jones potentials generate an evolution of ν which is degenerate: in this case one of the three eigenvalues does not evolve and the picture associated with ν is two-dimensional. Also, the trends of the two eigenvalues are equivalent and do not allow to catch which sides of the real shape are shortening and which ones are stretching. Panel d: The interaction potentials are in this case harmonic bonds between adjacent atoms of the cubic lattice. The picture of ν is correct in the compression phase, that is in the central region of the plot 8000t<sup>u</sup> t 13000tu. During the stretching, instead, the three eigenvalues all follow a similar increasing/decreasing trend and none of them allow us to distinguish between which sides of the real shape are shortening and which ones are stretching. Taking the square-root of the reciprocals of all the eigenvalues cannot fix this inadequacy as all trends would be reversed.

Figure .: Panel a: Once external forces due to fictitious atoms and springs that are responsible for system deformation are included in the computation of the self-action, the overall picture results qualitatively in agreement with the dynamics of the threedimensional cubic set of atoms under scrutiny. The agreement appears when harmonic bonds or short Lennard-Jones interactions are considered, but it is not always the case for high values of Rc. Panel b: Temperature effects. Low-enough values of thermal oscillations (they do not alter the condensed state of the D matter) do not affect the time evolution of ν during the stretching phases, but generate trends that are not in agreement with the shape evolution of the systems when the external deformation mechanism is no more applied. A slight increasing trend in the values of λ<sup>1</sup> and λ<sup>3</sup> for t 15000t<sup>u</sup> can be evidenced, despite no overall deformation occurs.

Figure .: Panel a: General setting of the cubic lattice. Key quantities such as edge lengths and fictitious atoms, used to perform stretching by assignment of constant velocity motions along the z-axis, are explicitly depicted. Panels b and c: Snapshots comparing the evolution of the cubic lattice and the one of ν for a system that undergoes extension and compression phases. These images refer to a model in which harmonic potentials bond adjacent atoms and external forces have been used to compute the vectors fij in the expression of the self-action. The value of the material element volume e has been tuned to achieve comparable trends also from the quantitative point of view.

**Ubiquitin: mechanical stretching.** Ubiquitin protein is composed by residues. The consequent increment of degrees of freedom does not give rise to simulation results differing remarkably in the qualitative features from the ones obtained from the simple lattices considered so far.

In what follows, when we report results in the figures collected in this section, the values of the semi-axes of ν, evolving in time, are compared with the smallest rectangular parallelepiped including the protein. Its edges, namely bb1, bb2, bb3, are evaluated in the central principal frame of the molecular structure (a discussion on the rôle of the coordinate system is also in sec. .). Once again, the approach does not appear to be robust enough to achieve a proper description of the molecular shape unless we introduce (numerically) the external forces exploited in the deformation process, as it happens in the toy schemes discussed previously. However, we have no general proof that the rôle of the external forces in the simulation be always the one we have recognized here.

Figure .: Panel a: Ubiquitin stretching simulation where the protein is kept elongated at the end of detachment (t 5000tu). It is evident that the time evolution of ν is dominated by the trend of only one eigenvalue. The evolution of ν is not in agreement with the one of the molecule shape as λ<sup>1</sup> shows a monotonic trend when the molecule is simply oscillating in its linear conformation (T 0). Panel b: Once secondary structure terms in the overall potential (.) are neglected in the computation of the self action and the temperature is maintained equal to zero, a horizontal plateau is detected in the evolution of the main semi-axis of ν. The evolution of λ <sup>1</sup> <sup>2</sup> <sup>2</sup> and <sup>λ</sup> <sup>1</sup> <sup>2</sup> <sup>3</sup> well describe the trends of the x and y axes of the limiting parallelepiped and viewed in the principal central system of inertia of the protein (inset).

Figure . reports a comparison between two mechanical stretching simulations, where, at the end of detachment, the protein is kept elongated. To obtain an appropriate qualitative description of the molecular shape, it is necessary to neglect the contributions of the secondary structure in the computation of the vectors fij that determine zs, and to set the temperature equal to zero (see panel b in Fig. .). The need appears, obviously, because the elongated configuration is not an equilibrium one. Internal forces do not vanish – the protein oscillates driven by the ground state of the potential (.) – hence the self-action does not vanish and determines the evolution of ν (see panel a in Fig. .). Only when the contribute of such forces is neglected we obtain what is depicted in panel b of the same figure. We just notice that the results improve if we consider the square-root of the reciprocals of the second and third eigenvalues, differently from the first one, although there is no evident physical reason to justify such a choice. The picture holds true providing that peptide bonds are not loaded, otherwise the horizontal plateau in the trend of λ<sup>1</sup> would be impossible. Thermal fluctuations also compromise the evolution of ν in a similar way. Also, such a shortcoming cannot be easily eliminated by including the external forces by which we generate numerically the deformation path, as they would simply add a contribute to two of the vectors fij , which does not guarantee that z<sup>s</sup> would vanish. Actually there is only one eigenvalue of U that evolves appreciably and governs the evolution of ν, despite the analysis be three-dimensional.

In analyzing the whole protein, to obtain a clear picture of the evolution of the molecular shape it is not always sufficient to pull just one end of it and include the external force in the computation of zs. As it appears in comparing panels a and b in Figure ., only when both ends are pulled far apart from each other and the two external forces – the ones due to the springs that drag the molecule ends – are included together in the scheme (panel b), the picture is qualitatively in agreement with the real evolution of the protein shape (and it seems 'robust' as indicated by the additional examples in sec. .). Figure . compares the shape of the Ubiquitin with its coarse approximation in terms of ν. The approximation process based on the ideas underlying Cauchy-Born rule can be optimized by varying the amplitude of the normalization volume e. We can choose it, in fact, in order to achieve comparable amplitudes between the limiting parallelepiped and ν, with the possible need of a smoothing algorithm in post-processing.

The effect of temperature is usually negligible, during stretching and in the phase immediately consequent, but it influences the dynamics of ν in the long run, as we have already affirmed for the toy models analyzed previously. In the scenario described in this section, changes in the cut-off radius, frame of reference or stretching velocity do not induce prominent qualitative alterations.

#### **..Additional remarks**

As we have repeatedly affirmed, the continuum approach proposed here, with the specification of structure of z<sup>s</sup> in terms of the actions in the discrete model, seems to capture the dynamics of a single macromolecule in presence of constraints imposed by the environment. So, it can be used in representing the mechanical stretching of a protein in free space. For resultsrelevant to translocation dynamics in terms of the continuum model proposed here, see section ..

Figure .: Panel a: Data refer to a mechanical stretching simulation in which one end of the molecule is pulled at constant velocity by a fictitious bead and an external spring and the other end is held fixed in space. Despite the external force of the spring is included in the computation of the self-action, its presence is not enough to achieve an adequate representation of the molecule shape. Panel b: Only when mechanical stretching is performed pulling both ends with equal and opposite velocities, including both the two external spring forces in the self-action, the behavior in time of ν is in agreement with the real shape evolution of the protein. It is, as usual, necessary to take the square-root of the reciprocals of the eigenvalues as ν semi axes.

Figure .: Quantitative comparison between discrete Ubiquitin dynamics and the evolution of ν when both ends are pulled by external atoms. The material element volume has been optimized to allow the quantitative good agreement. Temperature is set equal to 0. Panel a: Beginning of mechanical pulling. Panel b: Maximum stretching state. Panel c: Conformation of the protein after maximum detachment has been achieved and the protein is left free to refold onto its native structure.

The scheme should be further enriched in case we would consider a deforming environment and/or the presence of a dense population of interacting macromolecules. In the latter case the balance of actions (.) should include a microstress, due to the interactions of each molecule with the neighboring ones. Moreover, ν should not necessarily be associated with a single molecule, rather it could be a representative 'in average' of a number of molecules occupying a portion of space considered as material element and assigned to a point in a multiscale and multi-field continuum representation of the macromolecular population (this way we should go along the guidelines of the mechanics of complex materials []).

In any case, the inadequacy in presence of molecular free motion remains, as it will be shown in the conclusive part of the present section.

It is possible to overcome it by adopting a second-rank tensor ν as a descriptor of the single molecule with a different meaning from the one assigned here, and selecting an expression for z<sup>s</sup> different from (.) but inspired by it. Possible approaches and related results shall be discussed in section .

**Effect of time-dependent placement vectors for Ubiquitin.** Figure . displays the trends of the semi axes of the ellipsoid associated with U when placement vectors in the explicit expression of the self action z<sup>s</sup> depend on time, meaning they are computed at every time step on the actual configuration of the protein, r<sup>i</sup> r<sup>i</sup> t . It is clear that the monotonic behavior, theoretically predicted from considerations on D systems, still holds for greater sets of atoms.

Figure .: Trends of the eigenvalues of U in a thermalization simulation for the Ubiquitin protein when placement vectors r<sup>i</sup> in z<sup>s</sup> vary in time. The behavior displayed by the analytical results for the biatomic system are confirmed also for the Ubiquitin protein. Data refer to an equilibration temperature T 0.5. The molecule maintains its folded state: the radius of gyration undergoes just small fluctuations around a constant value. The semi-axes of the morphological descriptor follow a monotonic evolution not in accordance with the protein shape dynamics.

**Frame of Reference.** A few remarks on the way we have used frames of reference in computations are collected here.

Figure .: Comparison between U semi-axes for stretching runs relevant to the PDB frame and to the central principal system (PRI) for the Ubiquitin protein. Evolutions can be considered equivalent, despite an opposite trend for the intermediate eigenvalue that, however, results nugatory for the final shape of the morphological descriptor. Data refer to a simulation where both terminuses are dragged with opposite velocities, T 0, and external fictitious forces are included in the computation of zs.

When we consider just the reference values of the placement vectors ri, under rotations R SO 3 of the ambient space (where we evaluate the vectors fij ) alone, z<sup>s</sup> becomes Rz<sup>s</sup> because the generic addendum fij r<sup>i</sup> defining it changes into Rfij ri, for r<sup>i</sup> is fixed once and for all in the reference space that is different from the ambient one and isomorphic to it, the isomorphism being simply the identification. In principle, we could also considerrotations R SO 3 in the reference space so that z<sup>s</sup> would change into RzsR <sup>T</sup> with the possibility of having RzsR<sup>T</sup> when R R. Tensor ν follows analogous rules, being the integral in time of negative zs, according to the evolution equation ν˙ zs.

In contrast, when we take r<sup>i</sup> in the current configuration, as in the previous example, under rotations R of the ambient space z<sup>s</sup> is mapped into RzsR<sup>T</sup> , and ν, evaluated this time in the current shape of the molecule, undergoes an analogous change.

Consequently, when we compute ν in one frame and want to know it in another one rotating in time, we have to know the whole time-history of the rotation up to the instant considered.

Here we have considered the frame of reference associated to the protein data bank file (PDB), which is the one where protein coordinates are provided and can be selected once for all at will in space. However, if we develop computations with respect to the principal inertial frame (PRI) of the molecule, the results do not alter the picture that we have shown in previous sections. Figure . shows that the overall shape of the molecular descriptor is equivalent in both cases, in the present example, if quantities are computed with respect to the protein data bank reference or to the central principal frame, despite an opposite trend in <sup>λ</sup> <sup>1</sup> <sup>2</sup> <sup>2</sup> . Broadly speaking, on average, differences in between the two cases can be more remarkable than what reported here, nevertheless a final argument to prefer the central principal system to the PDB one is not apparent from our analyses.

Figure .: Trends of the square root of the reciprocal of the eigenvalues of U in a thermalization simulation of the Ubiquitin protein. <sup>R</sup><sup>c</sup> <sup>3</sup>.0A˚ and <sup>R</sup><sup>c</sup> <sup>6</sup>.5A˚. The first three pictures show that the evolutions are practically the same. However, the protein with <sup>R</sup><sup>c</sup> <sup>3</sup>.0A˚ is unfolding, according to the comparison depicted in the last panel, where the trends of the radii of gyration of the two molecules is showed. The approach followed here does not allow to correctly detect thermal unfolding, a situation dominated by inhomogeneous deformation and random fluctuations that do not contribute to the overall shape evolution.

**Thermalization simulations.** For each cut-off radius considered in previous analyses we have performed thermalization simulations at T 0.50θ (in the already mentioned code units) to measure the evolution of the molecular descriptor associated with the protein shape. The protein is free to fluctuate in the ambient space, following the Langevin dynamics already mentioned, with no driving forces nor additional constraints. Each simulation starts from the crystallized structure of the protein according to the PDB file. Unfolding is then allowed, providing that the value of the cut-off radius be small enough. Protein configurations are also regularly sampled to obtain statistical independent initial states for mechanical stretching simulations. As it was expected, changes in the molecular shape, due to thermal equilibration, are not properly detected by the approach outlined in this work, since the deformation process is inhomogeneous and not remarkable (the radius of gyration of a thermally unfolded protein is only about two times greater than the one own of a compact structure). In addition, single-residue fluctuations play a prominent rôle at the atomic level, with-

out a corresponding macroscopic deformation. At the given equilibration temperature, structures with cut-off radius equal to 3.0A˚unfold, while if R<sup>c</sup> 6.5A˚ they do not. Figure . depicts these remarks for the Ubiquitin protein. The trends of the semi-axes of U are similar and mostly indistinguishable, despite a protein undergoes thermal unfolding (as it results from the trends in the radius of gyration). A similar behavior is found in all the equilibration simulations that we have performed.

Figure .: Panel a: Trends of the reciprocals of the square root of the eigenvalues of U in a stretching simulation of Ubiquitin compared with the dynamics of the smallest rectangular parallelepiped containing the molecule. The eigenvalues are here depicted as sequences of points, the values of the edge of the parallelepiped as continuous lines. <sup>R</sup><sup>c</sup> <sup>3</sup>.0A˚, <sup>T</sup> <sup>0</sup>, <sup>v</sup>str <sup>0</sup>.05. Data refer to a simulation performed with respect to the principal central frame. Panel b: Analogous data than in panel a but T 0.2. Molecular descriptor data are computed with respect to the PDB system of coordinates.

**Additional results for stretching simulations.** We present here some additional data which can indicate the statistical robustness of the results provided above, concerning the free-space stretching of the protein Ubiquitin when both ends are pulled at opposite constant velocities and external forces pertinent to the fictitious atoms are included in the computation of zs, see Figures .-.. The material element volume e has not been optimized for quantitative comparisons for the data below, as here we are just interested in furnishing a qualitative picture. Specifically, we have performed ten independent simulations for each cut-off radius (Rc), temperature (T), and stretching velocity (vstr) implemented (R<sup>c</sup> 3.0A˚or R<sup>c</sup> 6.5A˚, T 0 or T 0.2, vstr 0.05 or vstr 0.1). For all the cases, results can be deemed qualitatively equivalent to the few additional examples here provided.

Figure .: Panel a: Trends of the reciprocals of the square root of the eigenvalues of U in a stretching simulation of Ubiquitin compared with the dynamics of the smallest rectangular parallelepiped containing the molecule. The eigenvalues are here depicted as sequences of points, the values of the edge of the parallelepiped as continuous lines. <sup>R</sup><sup>c</sup> <sup>6</sup>.5A˚, <sup>T</sup> <sup>0</sup>, <sup>v</sup>str <sup>0</sup>.05. Data refer to a simulation performed with respect to the principal central frame. Panel b: Analogous data than in panel a but T 0.2. Molecular descriptor data are computed with respect to the PDB system of coordinates.

Figure .: Panel a: Trends of the average over runs of the reciprocals of the square root of the eigenvalues of U in stretching simulations of Ubiquitin (data relevant to the central principal frame). These quantities are compared with the average dynamics of the bounding box. The eigenvalues are here depicted as sequences of points, the values of the edges of the parallelepiped as continuous lines. <sup>R</sup><sup>c</sup> <sup>3</sup>.0A˚, <sup>T</sup> <sup>0</sup>.2, <sup>v</sup>str <sup>0</sup>.1. Panel <sup>b</sup>: Analogous average data than in panel <sup>a</sup> but now <sup>R</sup><sup>c</sup> <sup>6</sup>.5A˚, <sup>T</sup> <sup>0</sup>.0, <sup>v</sup>str 0.1. Molecular descriptor data are here computed with respect to the PDB system of coordinates.

#### **..Protein descriptor in MBP stretching simulations**

As already mentioned for the Ubiquitin protein, placement vectors r<sup>i</sup> refer to reference (equilibrated) configuration and do not depend on time. Here results relevant to the first stretching protocol are presented. In fact for the MBP, in contrast to what shown previously, we find it sufficient to pull just one end of the molecule (keeping the other terminus blocked in space) to gain an appropriate

picture of protein shape evolution. As usual, the external force generated by the ghost atom that drives the deformation pathway is included in the computation of the self-action and the square root of the reciprocals of the eigenvalues of U are again depicted as the semi axes of an ellipsoid.

In what reported in Figure . (N-terminus pulling, R<sup>c</sup> 7.5, T 0.75, vstr 0.1 tu), the material element volume e has been optimized for quantitative comparisons. In this case raw data are worked out to gain the trends and images from panel a to panel f. First, original evolutions are smoothed by supposing a process with memory. Namely a linear-weighted average over previous values is performed for each data. This way trends can be softened and, especially, λ <sup>1</sup> <sup>2</sup> <sup>1</sup> (the one that practically governs the description) is closer to bb<sup>1</sup> evolution, also in the equilibration part. Panels a, b and c report the outcome of the application of this algorithm to the λ <sup>1</sup> <sup>2</sup> <sup>i</sup> trends. Finally, to achieve the ellipsoid snapshots only (panels d, e and f) a magnification coefficient equal to is used to recover a high-enough value of the first peak, as it can be noted that it is considerably reduced by the smoothing. No magnifications or reductions are used for λ <sup>1</sup> <sup>2</sup> <sup>2</sup> and <sup>λ</sup> <sup>1</sup> <sup>2</sup> <sup>3</sup> , as the average trend of those is deemed appropriate. To get the pictures, the center of the ellipsoid is conveniently attached to the center of mass of the molecule and its orientation is governed by the eigenvectors of U. Thanks to these expedients, the description appears appropiate, despite the evolution of the medium and smaller axes may be deeper.

We add that in the present approach the equivalent radius of the sphere having the same volume of the ellipsoid follows a trend dominated by the first eigenvalue, that is a trend qualitatively equivalent to the radius of gyration one (the volume increases in a stretching simulation).

#### **.. MBP morphological descriptor translocation results**

In translocation simulations we find that the presence of the pore, which constitutes a constraint that limits accessibility in phase space, enhances the quality of the protein shape description. To the sake of clarity, in the present approach, both the pulling force and the actions, due to the channel potential, are included in the computation of the z<sup>s</sup> as external forces. The confinement acts similarly to a decrement of the temperature, limiting residue fluctuations inside the pore. Obviously, to achieve a proper outcome, either the volume of the material element e needs to be small or the pore long enough to provoke a considerable stretching, detectable by the eigenvalues of U. The issue is well summarized in Figure ., where results related to two different pore lengths are depicted, holding the material element volume constant at a standard value (equivalent sphere radius: 30A˚). Thus, once again, main limitations are due to the necessity to fine-tune such a volume on the basis of the simulation performed, or rather, according to a sort of 'maximum value of deformation'.

# Coarse-grained molecular dynamics and continuum models

Figure .: Raw and elaborated data are shown for a stretching simulation of Maltose Binding protein. As evident from the first row, to gain a better agreement between peak magnitudes, e should be further reduced, but this would generate instabilities in the routines used for polar decomposition. When depicted, protein dimensions are normalized by the radius of gyration own of the PDB crystallized structure. Panel a: Trend of the original λ <sup>1</sup> <sup>2</sup> <sup>1</sup> as resulting from fine tuning of e . The peak is high, reaching a value eleventh times greater than the initial one. Its evolution is also tight and sharply decreases during thermalization. Panel b: Despite the reduction in the value of e the drop at maximum detachment does not overcome % of the initial value and is equal to % at the end of the simulation: the dull λ <sup>1</sup> <sup>2</sup> <sup>2</sup> dynamics is a limitation of the model. Panel c: The minimum value at 10000t<sup>u</sup> corresponds to a drop of % of the initial value and has to be compared to a % reduction in bb3, so the agreement is not very close in magnitude. Panel a: λ <sup>1</sup> <sup>2</sup> <sup>1</sup> trend resulting from a weighted running average. Specifically, the weights are linear (from to ) and spread over the current value (weight = ) back to the one-thousandth previous one (weight = ). The approach allows improving the agreement between λ <sup>1</sup> <sup>2</sup> <sup>1</sup> and bb1, especially in the thermalization part. Panel b and c: Same approach just described applied to λ <sup>1</sup> <sup>2</sup> <sup>2</sup> and <sup>λ</sup> <sup>1</sup> <sup>2</sup> <sup>3</sup> . A similar smoother behavior is detected and amplitudes are reduced. Panel d: Snapshot of the ellipsoid and molecule at the beginning of the pulling. The protein is compact and the ellipsoid is close to a sphere. Here and in panels e and f a magnification coefficient is applied to the ellipsoid semi axis to recover the amplitude of the peak smoothed by the averaging process. Panel e: Ellipsoid and protein near the maximum value of the stretching. A remarkable elongation is detected in agreement with the almost linear fashion of the polymer. Panel f: Snapshot taken at the end of equilibration. The morphological ellipsoid shrinks still too quickly from its maximum elongation when compared to its shape in panel d.

Panel a, b and c underline the difference in the amplitudes of the square root of the reciprocals of the eigenvalues once the channel is elongated from 100A˚to 600A˚. Panels d through f compare the MBP shape and the ellipsoid associated withU once projected onto the plane formed by eigenvectors and (600Å channel). The choice of the plane is not by chance, as again we observe that eigenvalues and govern the dynamics, while the second one experiences just little fluctuations. The outlined approach confirms its appropriateness to summarize shape evolutions for a wide range of simulation parameters in constrained dynamics. We performed several transport simulations across a nanopore by varying the cut-off radius, pore length, temperature, driving force (always considerably great, to perform translocation in a limited time window) and pulling terminus, without remarkable limitations in the description (providing that the transport be homogeneous in time).

However, the model is sensitive to the value of the pulling force: a greater value causes higher values of interatomic forces fij , which are detected by zs. Hence, two translocation simulations, which are equivalent from a macroscopic deformation point of view (same channel), just performed at different driving forces, are described by different ν. The one relevant to the greater value of the pulling force generates the ellipsoid that stretches the most, whereas instead the gross deformation of the proteins is practically the same. A similar behavior is also noticed in stretching runs when detachment velocity is increased.

#### **. Additional continuum formulations**

As we have seen, the proposed approach based on Cauchy-Born rule and power equivalence, appears to be robust enough only when external forces are included in the computation of zs. Also, the dynamics has to be, in a sense, driven, constrained, by the external environment. However, additional preliminary results (not shown here since they are deemed unreliable as a proper statistics of event has not been collected yet) seem to show that external forces might not be essential for MBP stretching and translocation simulations. In other words the effect of the number of degrees of freedom of the system might require further investigations.

What appears evident are the limitations of the model to capture thermal unfolding. In thermal simulations the homogeneity is lost due to random singlebead oscillations that overwhelm and hide the overall unfolding process. The drawback can be overcome by selecting different descriptors of the protein evolution and related modified expressions of zs, which arise from evaluations of the interaction mechanisms among material points in the molecule.

## Coarse-grained molecular dynamics and continuum models

Figure .: Translocation simulations for R<sup>c</sup> 3.0, C-terminus pulling, T 0.75, F 2.0. Panels a, b and c: Trends of the reciprocals of the eigenvalues for two values of the pore length (100 and 600Å). Despite a lower overall deformation if compared with the stretching simulations, the presence of the pore allows a better description of the molecule shape evolution. Especially λ <sup>1</sup> <sup>2</sup> <sup>3</sup> (panel c) appears to be particularly sensitive to the confinement, its decreasing trend is considerable, reaching a minimum equal to about one half of the initial value, regardless the pore length. λ <sup>1</sup> <sup>2</sup> <sup>1</sup> is instead influenced by the latter quantity (panel a), being negligible in the 100Å pore. Finally, the evolution of the second eigenvalue, although again pretty limited, is remarkable and usually greater than in the steered-dynamics case. Panels d, e and f: Direct comparison between the ellipsoid projected onto the planes associated with eigenvectors and and the molecule seen in the xz plane of the PDB system of coordinates. The quality of the description is considerable regardless the frame of reference employed. Due to the simulation protocol (alignment of the PBD x-axis with the pore axis and therefore with the pulling force direction), all the frames are very close to each other.

#### **..Varying the meaning of** ν

We change here the meaning of ν and consider it as a work tensor density performed 'in average' in deforming the whole protein. Also here we assume det ν 0. This way ν˙ is a power and it is supposed to be equal to the power density in the discrete model. The equation ν z<sup>s</sup> is now a tensor power balance. The structure of zs, here indicated by zs<sup>2</sup>, is that of a tensor power density. We express (heuristically) zs<sup>2</sup> as the sum of the tensorial product between the absolute value of the total force fij exerted between residues i and j and the time rate of change of the absolute value of the placement vector of the material

points with respect to the center of mass of the molecule, ∂<sup>t</sup> r<sup>i</sup> . It is important to point out that here indicates the absolute value of the components of the considered quantity, that is x is a vector which entries are the absolute values of the entries of vector x, it is not its norm, which in turn would be indicated by the symbol x . In a sense the dynamics here is like projected onto the first octant of the Cartesian space R<sup>3</sup>, through proper reflections (associated with the absolute value operator) that allow us to distinguish between departure or approach of a residue with respect to the center of mass, regardless their mutual placement in the frame of reference selected to assign point-coordinates. We write

$$z\_{s2} = -\frac{1}{|\mathfrak{e}|} \sum\_{i \neq j \in \mathfrak{e}} k\_i^{z\_{s2}} |f\_{ij}| \otimes \partial\_t |r\_i|. \tag{5.7}$$

In the above expression k<sup>z</sup>s<sup>2</sup> <sup>i</sup> is a dimensionless scalar quantity defined as the ratio between the actual distance of the point of application of the force from the center of mass of the molecule, r<sup>i</sup> , and the radius of gyration of the crystallized PDB structure of the protein, r<sup>0</sup> gir:

$$k\_i^{z\_{s2}} = \frac{\left\|r\_i\right\|}{r\_{gir}^0}.$$

The action fij involves all the elementary terms present in the Go-like model, ¯ such as actions stemming from peptide bonds, bending deformation and so on, equivalently to the previous approach. As evident from (.), not only the square root of the reciprocal of the eigenvalues of the symmetric matrix due to polar decomposition of ν are analyzed, but also the diagonal entries of ν are investigated and a suited ellipsoid built from them.

**Thermalization simulations results.** For a matter of brevity, we show only thermalization simulation results. In fact only in this case the first approach considered in the present chapter is not appropriate. However, similar outcomes (in terms of the quality of the overall picture) hold also for stretching and translocation simulations. Nevertheless, the description of thermal unfolding is the toughest task. The approach analyzed here is mainly tailored to follow protein deformation when diagonal components of ν are considered. However, also symmetric matrix eigenvalues provide an adequate geometrical description of the molecule.

In Figure . we show the trends of ν diagonal entries for a molecule with cut-off radius equal to 7.5A˚, while undergoes equilibration. As already shown (Fig. .), since the simulation temperature is 0.75, MBP remains compact. Also, that behavior can be easily detected by the lack of increment in the bounding box edge lengths, as they simply oscillate around constant values regardless the frame of reference used (all panels of Fig. .). In panels a, b and c we compare the trends of the square root of the reciprocals of the eigenvalues ofU with the evolution of the sides of the limiting rectangular parallelepiped in the central principal frame, despite the simulation runs with respect to the PDB system of coordinate. We do this because we believe that the description by means ofsymmetric matrix eigenvalues should be more general than the one based on the diagonal entries of ν. All the panels show a good agreement between the two trends reported in each of them. The eigenvalues do not show the almost monotonic trends as the ones in Figure .. Rather, they randomly fluctuate around the initial values, a behavior in accord with the bounding box one. However, fast spikes are not reproduced and sometimes a sort of opposite behavior is detected. Panels d, e and f report the evolution of ν diagonal entries and bounding box edges in the PDB frame of reference (bbx, bby, bbz). ν diagonal entries are indeed tied to the system of coordinates where quantities are evaluated. It is evident how much these trends faithfully reproduce the evolution of the molecule shape even in such a random dynamics. The magnitude of the oscillations can be easily changed by altering the value of the material element volume once the approach here analyzed is chosen. Fast spikes are closely matched. Such an agreement holds true for all the simulations we have performed, regardless the value of temperature and cut-off radius. Therefore the associated ellipsoid always describes the protein shape in a pointwise fashion, not only in average. Unfortunately, neither the eigenvectors of ν nor of U, (not even the rotation matrix arising from polar decomposition) provide a straightforward way to understand the orientation of the polymer. Also, in this particular case, since the molecule remains folded, ν diagonal components are really able to follow the details of the deformation pathway (bounding box oscillations are only a few Angstroms), which is not macroscopic. To understand what just stated, see Figure .. It clearly shows the ability of the current approach to distinguish between a protein that remains compact in a heat reservoir and one that instead unfolds. Indeed, panels a through c compare three quantities: square root of the reciprocals of the eigenvalues of U for a 3.0Å cut-off radius molecule, the relevant evolution of the bounding box sides and the square root of the reciprocals of the eigenvalues of U for a protein that remains folded. First of all we underline that the agreement between eigenvalues and associated edges of the limiting rectangular parallelepiped is appropriate on an average sense, since bounding box spikes are not so closely followed by ellipsoid semi-axes. However, the magnitude of the deformation is clearly detected: fluctuations in the evolution of <sup>λ</sup> <sup>1</sup> <sup>2</sup> <sup>i</sup> associated with the folded molecule are hardly visible. Such quantities remain practically flat and very close to the initial value in a macroscopic standard, that is the one determined by the fluctuations belonging to the 3.0Å cut-off radius MBP. It is evident that we can recognize that the latter one is undergoing big deformations (thermal unfolding), while the former one practically remains in a native-like conformation. Also, now it is clear what we meant about the ability of the selected approach to follow the little details of the deformation process, even the smallest changes in the shape are

Figure .: Thermalization simulation results for MBP. R<sup>c</sup> 7.5, T 0.75. The protein remains folded with these parameters. Panels a, b and c: Comparison between the bounding box side lengths and the square root of the reciprocals of the eigenvalues of U. The former are computed in the principal central frame, the latter are relevant to the polar decomposition of ν evaluated in the PDB coordinate system. In panel a the highest-valued eigenvalue trend is shown, in panel b the middle-valued one and, accordingly, in panel c the lowest. All three show a remarkable agreement with the trends of the bounding box edges, especially considering that are relevant to different frames, in compliance with the need that the eigenvalues of the symmetric matrix should have a general meaning. Panels d, e and f: Comparison of ν diagonal components and bounding box sides in the PDB frame. The agreement is in all cases at pointwise level, spikes are correctly reproduced, despite the gross deformation of the molecule is negligible as no thermal unfolding takes place.

recorded and are scaled correctly, providing that the material element volume of the compared evolutions be the same.

The same picture is confirmed in panels d, e and f of Figure . where diagonal components are considered. Also in this case unfolding is clearly detected. Moreover, the agreement between the ellipsoid semi axes and the bounding box sides appears adequate not only on average but pointwise. If the ellipsoids are plotted, the difference in their dimensions is evident and we can detect which molecule is unfolding, as shown in Figure .. Data used for this figure are the ones also reported in Figure ., when the difference between the first eigenvalues is close to the maximum one.

#### **..** z<sup>s</sup> **model and \***

**Explicit formulation.** We can also choose different expressions of z<sup>s</sup> as explicit constitutive structure. Two of them are discussed here. The first one, indicated

Figure .: Thermalization simulation results for MBP. <sup>R</sup><sup>c</sup> <sup>3</sup>.0A˚, and <sup>R</sup><sup>c</sup> <sup>7</sup>.5A˚, T 0.75. The molecule with the lower R<sup>c</sup> unfolds, the other one remains compact. The material element volume is the same for the two cases, equal to a sphere of radius <sup>20</sup>A˚. Panels a, b and c: Comparison between the bounding box side lengths (solid dark line) and the reciprocals of the eigenvalues of the <sup>3</sup>.0A R˚ <sup>c</sup> molecule (dotted line). Also the reciprocals of the eigenvalues of the <sup>7</sup>.5A R˚ <sup>c</sup> protein are showed for a quantitative comparison (dashed line). The agreement between the <sup>3</sup>.0A˚case and relevant bounding box sides is appropriate in all panels. Also, the current approach is able to distinguish between a molecule that undergoes thermal unfolding and one that does not, accordingly to the difference in λ <sup>1</sup> <sup>2</sup> <sup>i</sup> magnitude. Panels d, e and f: Comparison of ν diagonal elements and limiting parallelepiped edges in the PDB frame of reference for the <sup>R</sup><sup>c</sup> <sup>3</sup>.0A˚simulation. The agreement is appropriate in all cases and even fast spikes are closely followed.

by zs<sup>3</sup>, reads

$$z\_{s3} = -\frac{1}{|\mathfrak{e}|} \sum\_{i \neq j \in \mathfrak{e}} k\_{ij}^{z\_{s3}} (f\_{ij} \cdot l\_{ij}) \iota \otimes \partial\_t |r\_j|. \tag{5.8}$$

Here lij is a dimensionless vector defined as

$$d\_{ij} = \left( r\_{ij} \cdot \frac{r\_{ij}}{\|r\_{ij}\|} \right) \frac{r\_{ij}}{\|r\_{ij}\|},\tag{5.9}$$

where rij is the position vector of material point j with respect to i (note that rij rji); k<sup>z</sup>s<sup>3</sup> ij is a dimensionless quantity defined by

$$k\_{ij}^{z\_{\pi3}} = \frac{\|\boldsymbol{r}\_{ij}\| - \|\boldsymbol{r}\_{ij}^{0}\|}{\|\boldsymbol{r}\_{ij}\| - \|\boldsymbol{r}\_{ij}^{0}\|} \frac{\|\boldsymbol{r}\_{j}\|}{\boldsymbol{r}\_{gir}^{0}},\tag{5.10}$$

Figure .: Comparison of the ellipsoids (and associated proteins) for the two cases already compared in panels a, b and c of Figure .. It is apparent which is the molecule unfolding just from the difference in the shape of the ellipsoids, which refer to the square root of the reciprocals of the eigenvalues of U. Snapshots are taken when the difference between the first eigenvalues is close to the maximum one, according to Figure .a. The palette indicates the z-axis values of the upper half portion of the ellipsoids. Protein dimensions are normalized by r<sup>0</sup> gir.

where, r<sup>0</sup> ij is the placement vector of residue j referred to residue i in the reference configuration of the molecule and other symbols have been already defined. Finally ι is a vector the entries of which are all unity.

In this approach the first quantity involved in the tensor product is an effective measure of the local action between amino acids i and j, while the second quantity, once summed upon all points and integrated in time, is a gross measure of the evolution of the distances within the molecule thought as a whole. Since the actions practically result from the projection of the forces exchanged between amino acids i and j onto the vector joining them, only peptide interactions and native contacts (Lennard-Jones potentials) are considered in the computation of zs3. The essential idea behind this approach is that ∂<sup>t</sup> r<sup>j</sup> governs the evolution of zs3. Therefore the whole left side term of the tensorial product, i.e. k<sup>z</sup>s<sup>3</sup> ij fij lij ι is tailored in a way that it has a constant sign when the forces are generated by harmonic or harmonic-like potentials, as peptide bonds or Lennard-Jones interactions, respectively. Here 'harmonic-like potentials' means potentials that are attractive in a region of the phase space and repulsive in the complementary one. For a matter of clarity we shall refer to this approach as SAM in the present section.

An alternative formulation that stems from this model and actually reduces the whole evolution of ν to the evolution of its first diagonal entry, is subsequently described and will be addressed to as SAM\*. This approach allows one to drop the absolute value of the quantity r<sup>j</sup> in the time partial derivative of (.), without losing the ability to discerning between detachment or approach of an amino-acid with respect to the center of mass. The aim is accomplished by simply evaluating the vector r<sup>j</sup> in a local frame of reference that has its x-axis coincident with the vector itself, namely each residue has a local proper frame of reference where its placement vector (with respect to the center of mass) is evaluated. In the specific case, the placement vector itself fixes the x-axis of such a coordinate system. Thence only the first entry of r<sup>j</sup> is different from zero and so only the first diagonal component of zs<sup>3</sup> evolves. In practice, the order parameter is reduced from a second-rank tensor to a scalar. However, tensor rank can be easily restored claiming a volume-preserving deformation: in this way the other two diagonal entries of ν can be deduced by requiring that the ellipsoid associated with them would have a constant volume, equal to the initial one (the volume of the material element). Since there is no preferred assumption, in the volumepreserving enforcement the two indeterminate axes (i.e. the ones that are not governed by the self-action) are set equal to each other. Similarly to the second approach introduced in section ., both the square root of the reciprocals of the eigenvalues and ν diagonal entries are considered for SAM, while for SAM\* only the latter ones are evaluated.

**Main results.** In this section all results for SAM and SAM\* are briefly outlined. Generally speaking, the formulations analyzed here require the computation of several additional quantities or extra frames for every time-step during a simulation, resulting more time-consuming than the two previous ones. For this reason, no translocation simulations have been performed. Moreover, the already presented outcomes are deemed appropriate for the purpose of the present investigation. However, a few results are pointed out as the quality of the description for both SAM and SAM\* is adequate. As far as (.) is concerned, the diagonal components of ν provide a clear description of the MBP shape in all the runs performed, as it was expected. In this case, the value of e can be very small and so ν entries very large. As it results from Figure. ., ν 1, 1 leads the dynamics and closely follows the trend of the associated bounding box edge in the central principal frame for an AFM-like stretching simulation. The evolution of the remaining two entries is negligible, although panel b and c highlight the fact that their trends are not so close to the edges of the limiting parallelepiped. Usually the appropriateness of the picture improves in the PDB system of coordinates.

We find that only two of the eigenvalues of U evolve, as one of them is constantly equal to 1 regardless the involved dynamics. Hence the picture associated with the λ <sup>1</sup> <sup>2</sup> <sup>i</sup> is just two-dimensional, as show in Figure .d and relative inset. This D description is in a sense in accord with the evolution of the gross shape of the molecule (as basically one eigenvalue forms a peak and the other one a minimum), but was not expected.

As already stated, it is in the PDB frame of reference that SAMworks better. As it is depicted in Figure ., ν diagonal entries closely follow the evolution of the bounding box edges during a thermalization simulation. The agreement is considerable and the curves reported in both panels a and b are hardly distinguishable (this holds true also for the diagonal component not depicted). Quantitative agreement is just a matter of fine tuning the material element volume.

SAM\* approach generates a D evolution of the self-action and therefore also of the associated ν. However, the D evolution can be recovered artificially claiming a volume-preserving transformation for the ellipsoid, as already mentioned. In such a way it is probably obtained the most appropriate description of all the ones presented in the thesis, as the constant-volume assumption allows both <sup>λ</sup> <sup>1</sup> <sup>2</sup> <sup>2</sup> and <sup>λ</sup> <sup>1</sup> <sup>2</sup> <sup>3</sup> to considerably decrease from the initial value. The ellipsoid picture is drafted in Figure . for a stretching simulation, considering both the trends of the square root of the reciprocals of the eigenvalues (upper picture) and a snapshot of the system close to the maximum elongation of the molecule (panel b). From the latter panel it is immediately appreciable the quality of the description as the almost single file conformation of the MBP gives rise to an ellipsoid that is an extremely thin and long spindle.

#### **..Remarks**

Three additional formulation of z<sup>s</sup> have been considered to overcome the limitations of the basic approach. Among them, the first one (sec. .) shows the best premises, as it is computationally inexpensive and allows a proper description of the molecule shape in all the simulations performed. SAM and SAM\* approaches require the computation of several additional quantities at every time step, considerably slowing down the efficiency of the numerical code. Also, the symmetric matrix for SAM results two-dimensional, regardless the simulated dynamics. SAM\*, when associated to a volume-preserving deformation requirement, generates an evolution of the ellipsoid in close agreement with the real deformation of MBP. However, no translocation simulations have been performed for SAM and SAM\*. Also, systems smaller than the MBP have not been studied in this frameworks. However, there are no doubts that the appropriateness of the approaches here analyzed will not be compromised by the dimensionality of the system, since the associated z<sup>s</sup> formulations arise from considerations on deformation mechanisms between two single material points.

What is necessary to further develop are the physical consequences of selecting empirical formulations for zs, as the description of the material element shape is only the first step to be undertaken in the building up of a continuum model for a complex body.

Figure .: Mechanical stretching simulation for MBP. Panel a: Comparison between ν 1, 1 and the bounding box x-axis edge in the central principal frame, bb<sup>1</sup> . ν entry ends up leading the whole dynamics of the ellipsoid and it shows a trend totally in compliance with the gross shape evolution of the MBP. Panel b and c: ν 2, 2 and ν 3, 3 components compared with the associated edges. The trends are not so in agreement as in panel a, but it is appropriate to point out that amplitudes are both negligible if compared with ν 1, 1 . Panel d: This panel and the relative inset shows the trends of the two evolving symmetric matrix eigenvalues. The third eigenvalue (not shown) is always constant, independently from the performed simulation. Hence the ellipsoid evolution associated to U is only D for SAM. However λ <sup>1</sup> <sup>2</sup> <sup>1</sup> correctly reproduces the elongation while λ <sup>1</sup> <sup>2</sup> <sup>3</sup> depicts the shrinking of the shape. We note that symmetric matrix eigenvalues undergo oscillations that are considerably smaller than the ones of the diagonal entries of ν.

Figure .: Panel a: Here the trends of ν 1, 1 and the x-axis edge of the bounding box in the PDB frame are compared. Trends are so close to each other that a visual distinction is tough. It is in the PDB frame of reference that SAM achieves the best description of the MBP shape as showed in this case for the challenging thermalization case. Panel b: Similarly to panel a, ν 3, 3 and bb<sup>z</sup> are compared, pointing out again the close similarity. ν 2, 2 and bb<sup>y</sup> trends are not shown but their agreement is comparable with the ones reported in this figure.

Figure .: Panel a: Trends of the square root of the reciprocals of the eigenvalues associated to SAM\* for a mechanical stretching simulation of MBP. λ <sup>1</sup> <sup>2</sup> <sup>2</sup> and <sup>λ</sup> <sup>1</sup> <sup>2</sup> <sup>3</sup> are obtained claiming a constant volume deformation for the ellipsoid and are set equal to each other. The trends result in close agreement with the overall shape deformation of the protein in a AFM-like stretching simulation. Panel b: Visual comparison between the almost linear shape of the MBP at maximum elongation and the associated ellipsoid, which has been obtained from the data of panel a at maximum stretching multiplied by the radius of gyration of the native MBP (also, a magnification coefficient of 2 for the semi axes has been necessary, although a fine tuning of e would have worked the same). The affinity of the two shapes is here substantial as the constant volume assumption allows to achieve an adequate shrinking of both the transverse axes of the ellipsoid.

# **References**


Coarse-grained molecular dynamics and continuum models

cell-free membrane patches. «Pflügers Archiv European Journal of Physiology», ():–, .


lumen. «J. Am. Chem. Soc.», ():–, .


Coarse-grained molecular dynamics and continuum models


and experiment. «J. Chem. Phys.», ():–, .


Coarse-grained molecular dynamics and continuum models



Coarse-grained molecular dynamics and continuum models



#### premio tesi di dottorato

#### anno 2007

Bracardi M., *La Materia e lo Spirito. Mario Ridolfi nel paesaggio umbro*


#### anno 2008

Bemporad F., *Folding and Aggregation Studies in the Acylphosphatase-Like Family*


Colica G., *Use of Microorganisms in the Removal of Pollutants from the Wastewater*

Gabbiani C., *Proteins as Possible Targets for Antitumor Metal Complexes: Biophysical Studies of their Interactions*

#### anno 2009

Decorosi F., *Studio di ceppi batterici per il biorisanamento di suoli contaminati da Cr(VI)*


Pace R., *Identità e diritti delle donne. Per una cittadinanza di genere nella formazione*

Vignolini S., *Sub-Wavelength Probing and Modification of Complex Photonic Structures*

#### anno 2010

Fedi M., *«Tuo lumine». L'accademia dei Risvegliati e lo spettacolo a Pistoia tra Sei e Settecento*

Fondi M., *Bioinformatics of genome evolution: from ancestral to modern metabolism. Phylogenomics and comparative genomics to understand microbial evolution*

Marino E., *An Integrated Nonlinear Wind-Waves Model for Offshore Wind Turbines*

Orsi V., *Crisi e Rigenerazione nella valle dell'Alto Khabur (Siria). La produzione ceramica nel passaggio dal Bronzo Antico al Bronzo Medio*

Polito C., *Molecular imaging in Parkinson's disease*

Romano R., *Smart Skin Envelope. Integrazione architettonica di tecnologie dinamiche e innovative per il risparmio energetico*

#### anno 2011

Acciaioli S., *Il* trompe-l'*œil letterario, ovvero il sorriso ironico nell'opera di Wilhelm Hauff*

Bernacchioni C., *Sfingolipidi bioattivi e loro ruolo nell'azione biologica di fattori di crescita e citochine*

Fabbri N., *Bragg spectroscopy of quantum gases: Exploring physics in one dimension*

Gordillo Hervás R., *La construcción religiosa de la Hélade imperial: El Panhelenion*

Mugelli C., *Indipendenza e professionalità del giudice in Cina*

Pollastri S., *Il ruolo di TAF12B e UVR3 nel ciclo circadiano dei vegetali*

Salizzoni E., *Paesaggi Protetti. Laboratori di sperimentazione per il paesaggio costiero euro-mediterraneo*

#### anno 2012


Bondì D., *Filosofia e storiografia nel dibattito anglo-americano sulla svolta linguistica*


Petrucci F., Petri Candidi Decembrii *Epistolarum iuvenilium libri octo.* A cura di Federico Petrucci

Ramalli A., *Development of novel ultrasound techniques for imaging and elastography. From simulation to real-time implementation*

#### anno 2013

Bacci M., *Coarse-grained molecular dynamics and continuum models for the transport of protein molecules*

Brancasi I., *Architettura e Illuminismo. Filosofia e progetti di città nel tardo Settecento francese*

Cucinotta E., *Produzione poetica e storia nella prassi e nella teoria greca di età classica*

Locatelli M., *Mid infrared digital holography and terahertz imaging*


Muniz Miranda F., *Modelling of spectroscopic and structural properties using molecular dynamics*

Pellegrini L., *Circostanze del reato: trasformazioni in atto e prospettive di riforma*