# A half century of Romance linguistics

Selected proceedings of the 50th Linguistic Symposium on Romance Languages

Edited by Barbara E. Bullock Cinzia Russi Almeida Jacqueline Toribio

#### Open Romance Linguistics

#### **Editors:**

Lorenzo Filipponio (HU Berlin) Richard Waltereit (HU Berlin) Esme Winter-Froemel (Julius-Maximilians-Universität Würzburg) Anne C. Wolfsgruber (HU Berlin)

#### **In this series:**


# A half century of Romance linguistics

Selected proceedings of the 50th Linguistic Symposium on Romance Languages

Edited by Barbara E. Bullock Cinzia Russi Almeida Jacqueline Toribio

Barbara E. Bullock, Cinzia Russi & Almeida Jacqueline Toribio (eds.). 2023. *A half century of Romance linguistics: Selected proceedings of the 50th Linguistic Symposium on Romance Languages* (Open Romance Linguistics 2). Berlin: Language Science Press.

This title can be downloaded at: http://langsci-press.org/catalog/book/369 © 2023, the authors Published under the Creative Commons Attribution 4.0 Licence (CC BY 4.0): http://creativecommons.org/licenses/by/4.0/ ISBN: 978-3-96110-405-5 (Digital) 978-3-98554-063-1 (Hardcover)

DOI: 10.5281/zenodo.7525084 Source code available from www.github.com/langsci/369 Errata: paperhive.org/documents/remote?type=langsci&id=369

Cover and concept of design: Ulrike Harbort Typesetting: Luis Avilés González, Sebastian Nordhoff Proofreading: Agnes Kim, Damien Hall, Dominic Schmitz, Elliott Pearl, Hella Olbertz, Jeroen van de Weijer, Jean Nitzke, Rebecca Madlener, Valeria Quochi Fonts: Libertinus, Arimo, DejaVu Sans Mono Typesetting software: XƎLATEX

Language Science Press xHain Grünberger Str. 16 10243 Berlin, Germany http://langsci-press.org

Storage and cataloguing done by FU Berlin

### **Contents**


#### Contents


## **Acknowledgments**

This material is based upon work supported by the National Science Foundation under Grant Number 1918245. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

## **The digital transformation of the LSRL: The first 50 years of Romance linguistics in the Americas ends virtually**

Barbara E. Bullock, Cinza Russi & Almeida Jacqueline Toribio<sup>a</sup>

<sup>a</sup>The University of Texas at Austin

Widely considered to be the premier event in Romance linguistics worldwide, the Linguistic Symposium on Romance Languages (LSRL) offers a venue for the dissemination of the results of state-of-the-art research in linguistics as it is applied to the Romance languages. In hosting LSRL 50, we aimed to highlight innovative approaches to problems in Romance linguistics; provide a forum in which scholars of different orientations communicated with one another; showcase research that uses different types of data and methodologies; bridge linguistics with the STEM fields; promote a culture of shared tools and data; and actively involve students in conference activities. We reached our goals in the end but our path toward satisfying them was more complex than we could ever have anticipated. This chapter traces that path, surveys the many intellectual contributions that are contained within this volume, and offers our acknowledgments to those who helped us achieve our ends along the way.

#### **1 #50 meets a (surmountable) obstacle**

The LSRL had been hosted, in-person, on a variety of campuses in the Americas on an uninterrupted basis since 1971. (In 2023, it is slated to be hosted in Paris, France, leaving the Americas for the first time after half a century.) Remarkably, there is no society or board behind this endeavor; instead, the tradition of LSRL is upheld from year to year by faculty and student volunteers who demonstrate their commitment to the promotion of linguistic research on Romance languages by sponsoring the conference on their campuses. LSRL 50 was to be hosted, for a record fourth time in its first half century, on the campus of the University of

Texas at Austin, April 23–25, 2020. The program had been set, the travel arrangements for our plenary speakers were complete, coffee and breakfast tacos had been ordered for breaks, a poolside reception and a banquet had been planned, and the early registration period had concluded with nearly 200 attendees prepared to join us in Austin. However, on March 10, 2020, with heavy hearts, we canceled the in-person meeting in response to the perilous spread of the SARS-Covid-19 virus worldwide.

As is customary in the introduction to a volume of proceedings, we will address the scholarly content of LSRL 50 by overviewing the intellectual substance of our contributing authors. But before we do so, we present a brief history of how the conference was salvaged so that we may honor the many individuals who offered us their assistance and encouragement during such a difficult time. We also hope that our experiences in organizing an online symposium may be of benefit to future organizers who need to be prepared to transition to a digital platform should unbidden events conspire to derail their planning. In our own case, luck intervened to help us carry through with our plans: just as we had to cancel LSRL 50, organizers from University of Massachusetts, Amherst announced on the listserv, the LINGUIST List, that the 33rd Annual CUNY Human Sentence Processing Conference would be held March 19–20, 2020, as planned, but in an online format. In early April 2020, we contacted Professor Mara Breen, the named conference organizer for CUNY 2020, for guidance and information about hosting a synchronous, online conference. In her response, she graciously detailed their strategies. Much of their practices were translated into a "How to LSRL 50!" link on our conference website that instructed attendees on (i) downloading and troubleshooting on the platform Zoom, (ii) registering to attend sessions, (iii) presenting papers and posters, and (iv) chairing sessions.

With a group of then graduate students in Romance Linguistics from our institution – Dr. Aris Clemons, Dr. Tracey Adams, Dr. Joshua Franks, Anna Lawrence, and Luis Avilés González – we mapped out the logistics of a new conference schedule. In May, 2020, we constructed a Qualtrics survey for our plenary and juried paper speakers and our session chairs. The survey gauged their interest in participating in the conference virtually, their preference for attending the conference on consecutive days or split over two weeks, and their preference for dates in late June or early July of 2020. Nearly everyone responded that they would be pleased to participate and within four months of the cancellation of the physical conference, we hosted the event synchronously, online.

"LSRL 50 v2.0," as we affectionately call our digital version, featured a novel program schedule. In order to accommodate a full conference program of juried

papers, plenary talks, workshops, and a poster session, and to mitigate "Zoom fatigue," we shortened the time allotted for the presentation of juried papers from the normal 20 minutes to 15 with the usual five minutes for discussion. And, rather than unfolding over 3 continuous days, the conference was held over five days, July 1–3, 2020 and, a week later, July 6–8, 2020, but only for three contiguous hours each day, from 17:00–20:00 GMT. This timing allowed individuals worldwide to attend the conference at reasonable morning, evening, or nighttime hours according to their respective time zones.

The program included 56 selected juried papers that were classified thematically into 15 separate sessions. The conference also included sessions for the juried posters, the keynote presentations, the two workshops, the business meeting, and the virtual "happy hour" with a trivia contest. Attendance at any session was free but individuals had to register for the session with its Zoom host, either one of us, or one of the remarkable graduate students, named above, who helped us get the conference off the ground. To register, prospective attendees clicked on the name of the Zoom host listed on the conference program that was published on our website. This automatically generated an email request for an invitation to the session. The Zoom host responded to each request with an invitation that provided the Zoom meeting ID for the session and a suggestion that the participant add the event to their calendar. The requirement to register for the conference provided us with a layer of security from "Zoom bombers" who otherwise could have interrupted sessions with malicious content. But, having experienced no outside interventions during the conference itself, we began to broadcast all the meeting IDs for the days' sessions to the current attendees via the chat box in Zoom. This afforded them more spontaneity in choosing which session to attend and allowed them to move from one concurrent session to another if they chose to do so.

A week before the conference, all presenters and session chairs were invited to a training session with their Zoom host. The pre-raining assured us that everyone was comfortable with the platform and could access it from their locations. It also permitted speakers and chairs to interact before the conference and to ask any questions they might have about procedures before the event. That the essential participants of each session had already met and interacted with one another *before* the conference took place was one of the many benefits of hosting the conference online. Unlike an in-person conference, there were no mispronounced speakers' names and, during the conference, there was a notable congeniality between the speakers, the session chair, and the Zoom host that lent a more open, collaborative air to the discussions of the papers than that which often occurs in-person.

While the face-to-face interactions that normally occur at the LSRL were certainly missing in the online incarnation of LSRL 50, there were many benefits of the online format. The conference was free to anyone who wished to attend and accessible anywhere via an internet connection. This expanded the geographic exposure for LSRL and increased the number of attendees quite substantially. On Day 1, alone, we welcomed more than 250 individuals from every continent except Antarctica. And no session was attended by less than 40 individuals. The technology of the Zoom platform served to enhance audience participation, too, as attendees were free to comment or pose questions of a speaker, at any time during the session, by using the chat function. And, at the conclusion of each session, the Zoom host remained in the meeting so that participants could interact with one another informally. In sum, transitioning to an online format provided a venue that fostered community among linguists worldwide and helped to decrease our sense of isolation during a difficult time.

#### **2 Situating the scholarly content of LSRL 50**

The origin of LSRL is rooted in generative grammar. First organized in February 1971 by Jean Casagrande and Bohdan Saciuk under the auspices of the Department of Romance Languages and Literatures and the Interdepartmental Linguistics Program at the University of Florida, the LSRL was billed as the *Linguistic Symposium on Romance Languages: Application of Generative Grammar to their Description and Teaching*. The aim of the conference then, as now, was to contribute to the description of Romance languages while highlighting essential data in the evaluation, testing, and revision of theoretical proposals Casagrande & Saciuk (1972). The first LSRL attracted research presentations on Romance languages from notable linguists of the era including Ronald W. Langacker, William Cressey, Albert Valdman, Sanford Schane, Maria Luisa Rivero, and Richard Kayne. The significance and impact cannot be overstated: Over the years, scholars of Romance linguistics have notably informed the direction of inquiry in general linguistics. As one example, Kayne's (1972) seminal contribution on syntax, *Subject inversion in French interrogatives*, from the very first LSRL continues to accrue citations today. Distinguished linguists have continued to lay out their influential research programs in the proceedings of this venue, including Luigi Burzio, Luigi Rizzi, Margarita Suñer, Anna Cardinaletti, Karen Zagona, and Maria Luísa Zubizarreta in morpho-syntax and semantics, Irene Vogel, James Harris, Donca Steriade, Pilar Prieto, and José-Ignacio Hualde in phonology and phonetics, James Lantolf, Carmen Silva-Corvalán, David Birdsong, and Julia Herschensohn in bilingualism and second language acquisition, Shana Poplack, Raymond Mougeon

and William Ashby in sociolinguistics, Yakov Malkiel, Dieter Wanner, and Jurgen Klausenburger in historical linguistics, among many, many others. Through the years, this state-of-the-field event has continued to attract the participation of prominent linguists and their students who have contributed to the further development of theoretical models and have helped to steer the field of linguistics in new directions. Many of them, like the current authors, have held or hold their appointments in language departments, where they serve as the point of first contact with the field of linguistics for legions of undergraduates. Collectively they have mentored generations of younger scholars who have gone on to complete their degrees in linguistics from language and from linguistic departments.

In celebrating its 50th anniversary, we envisioned a conference that would branch out from the strong, theoretical roots of LSRL and lay the groundwork for replicable research programs and data sharing practices that are necessary to move the field forward. In this, we built on and drew from successful iterations of previous meetings in emphasizing the empirical turn of Romance linguistics research, in particular LSRL 34 (University of Utah) which focused on experimental approaches, LSRL 43 (CUNY Graduate Center) with a workshop on parsed corpora, LSRL 48 (University of Delaware) which focused on bridges to other disciplines, and LSRL 49 (University of Georgia) whose theme was "Big Data."

In the interest of documenting the history of the LSRL, we will archive the conference website, the conference Twitter feed from @LSRL50, and a complete database of the first 50 years of the published proceedings of the LSRL in the open access digital repository of Texas ScholarWorks of the University of Texas Libraries.

#### **3 The conference keynote addresses and workshops**

The keynote speakers of LSRL 50 were chosen to showcase research that employs diverse methods and different types of empirical observations applied to various Romance languages. These included two speakers from the hosting university: David Birdsong (Professor, French & Italian) and Patience Epps (Professor, Linguistics). David Birdsong is a psycholinguist of second language acquisition and bilingualism. His recent work concerns the measurement and predictive power of language dominance in bilinguals and in the individual factors that affect ultimate attainment. He delivered a keynote entitled "Conceptualizing ultimate attainment in bilingualism and second language acquisition." Dr. Epps is a linguistic anthropologist known for her documentary research on Hup (indigenous language of Brazil) and on the implications of language contact on establishing

linguistic typologies and pre-histories. She is a proponent of open data and the creator of public databases and archival collections of audio, text, and image media from speakers of Latin American languages. Dr. Epps spoke on the effect of contact between the colonizer languages, Portuguese and Spanish, and the Amazonian indigenous languages in a talk entitled "Multiple multilingualisms: Indigenous and Romance languages in Amazonia."

Our external speakers were Thamar Solorio (Associate Professor, Department of Computer Science, University of Houston) and Zsuzsanna Fagyal (Associate Professor, Department of French & Italian, University of Illinois at Urbana Champaign). Dr. Solorio specializes in the analysis of spontaneous language production, including forensic linguistics and clinical Natural Language Processing as applied to bilinguals. She analyzes written data from a variety of sources, including Twitter and speech transcripts. Bridging linguistics and STEM, Dr. Solorio delivered a keynote entitled "Enabling technology for code-switching data." Zsuzsanna Fagyal, a sociolinguist with a linguistic focus on prosodic variation, delivered a keynote address for LSRL 50, representing phonology and phonetics. The written version of her talk, entitled "For an integrative approach to variation and change in the French vowel system," appears in this volume.

The first workshop of the conference, "Wrangling linguistic data with Python," was created and facilitated by Jacqueline Serigos (Assistant Professor, Department of Modern and Classical Languages, George Washington University) with the goal of helping participants visualize and explore categorical language data, especially data that contains code-switches or borrowings. Dr. Serigos, who creates and uses large datasets in her research on Spanish, specializes in computational linguistics, language contact, semantics, and statistics. Luis Avilés-González (Ph.D. student in Hispanic Linguistics at the University of Texas) conducted a workshop/tutorial on LaTex using Overleaf to help prepare the conference presenters for the eventual publication of their papers in this volume. His own research is dedicated to investigating sociolinguistic variation in the use of discourse markers among Mexican migrant and heritage Spanish speakers in Southern California.

#### **4 An overview of the contributions of our authors**

Similar to the LSRL@50 event, this volume speaks to the depth and vitality of the field. The invited and refereed chapters are authored by multiple generations of scholars, from those completing postgraduate degrees to senior researchers. Their projects showcase studies in established and emergent subfields and their

attendant approaches and methodologies, directed at numerous Romance languages – Catalan, French, Italian, Picard, Portuguese, Romanian, and Spanish.

The compendium opens with the contribution by one of our invited plenary speakers. In Chapter 1, "For an integrative approach to variation and change in the French vowel system," Zsuzsuanna Fagyal, argues that the lowering of the French nasal vowels, with respect to their oral counterparts, was motivated by French speakers' accommodation to standardizing norms rather than by universal structural constraints on articulation. In a sweeping discussion that parallels the history of linguistics itself, Fagyal moves from the insights of linguists interested in the sociolinguistic history of French to the possibility that computational models might replicate the variation and change trajectory of nasal vowels in Romance languages.

Several of the ensuing chapters examine historical and on-going morpho-syntactic variation, contributing to debates surrounding the status and typologies of Romance languages as well as informing literatures and theories of microvariation. Chapter 2, "Assessing change in a Gallo-Romance regional minority language: 1pl verbal morphology and referential restriction in Picard" by Julie Auger and Anne-José Villeneuve is an important addition to the research on minority language varieties and their preservation, especially pertaining to the consideration bestowed to regional varieties whose language or dialect status remains controversial. This study evinces the significance of adopting a comparative approach in the evaluation of language change and variation across typologically related varieties, in particular when gauging whether or not closely related varieties are actually "different languages." The authors undertake a comparative investigation of a morphosyntactic change observed in French and Picard. Picard is an endangered dialect (stigmatized as an inferior, degraded variety) that has been at the center of substantial debate concerning its (dis)similarity with colloquial French. The change under analysis is the switch from first person plural to third person singular pronominal forms to refer to groups inclusive of the speaker, which was targeted in light of the commonly held position that French and Picard display sizeable differences at the phonological and lexical level but are very close at the morphosyntactic level. Drawing on data from an original corpus comprised of written Picard data from mid 1900s to present-day, and contemporary spoken French and Picard data, the study contrasts current spoken data from Picard-French bilinguals with present and older written data from three Picard authors born between 1931 and 1960, taking into consideration semantic properties of the referents (i.e., restricted, specific unrestricted, or general unrestricted) Auger and Villeneuve's results uncover a different use of the two structures under analysis in French and Picard: the use of first-person plural

is now marginal in colloquial French while it continues to be strong in Picard. Concerning semantic reference, in Picard, the third person singular appears to be linked primarily to general unrestricted reference but it is hardly used with restricted reference. Furthermore, it is revealed that the relative frequency of the two alternatives does not undergo notable change through time.

The resetting of the Null Subject Parameter in Brazilian Portuguese is explored by Mary A. Kato and Maria Eugenia Lammoglia Duarte in Chapter 3, "The partial loss of free inversion and of referential null subjects in Brazilian Portuguese." The work reviews scholarship that establishes Brazilian as a Partial Null Subject Language since the middle of the 19th century; unlike European Portuguese, Brazilian Portuguese was shown to display partial optional referential subjects, as well as null expletives and free inversion with unaccusatives. Of interest for the present paper is the finding that by the 1950s, referential null subjects were lost in most contexts, though null expletives persisted. These changes were triggered by loss of rich inflectional morphology. More intriguing is the documentation of the gradual recovery of null subjects through literacy, as attested in correlation with formal instruction in elementary school. Attestation of null subjects ranged from 2.11% for 1st grade pupils to 49.62% among 7th/8th graders. The work is significant in its approach to parametric variation, substantiating and refining a theoretical construct with historical and contemporary sources of written and oral data.

In Chapter 4, "The antipassive as a Romance phenomenon: A case study of Italian" Karina High draws from original corpus data spanning from the 13th to the 21st century to trace the diachronic distribution of three pairs of Italian pronominal verbs and their non-pronominal, transitive counterparts (*lamentarsi/lamentare* 'lament/complain', *ricordarsi/ricordare* 'remember/remind', *vantarsi/vantare* 'praise/boast'). She analyzes *si* as a detransivizing (i.e., valencyreducing) morpheme and proposes that the pronominal verbs instantiate antipassives since they exhibit distinctive structural behavior of the antipassive: They are syntactically intransitive but semantically transitive in that they involve demotion of the logical object to non-core argument/oblique realized as a prepositional phrase headed by *di*. High's results indicate that all three reveal a high frequency from the 13th to the 15th century; from the 16th century, however, the frequency of non-pronominal transitive constructions begins to increase and they become predominant from the 17th century onward. It is suggested that the extension of *si* to the antipassive construction may have been eased by the low degree of transitivity of the verbs under consideration, which all select experiencer subjects and theme (minimally or not at all affected by the action) direct objects.

Additional chapters adopt a syntactic-theoretical lens to shed new light on problems in Romance linguistics. Evidencing the special place held in Romance linguistics by constructions involving SE Irene Fernández-Serrano's, Chapter 5, "The role of SE in Spanish agreement variation," addresses the free alternation between agreement and non-agreement pattern in Spanish SE structures that involve inanimate postverbal subjects (e.g., *se discutieron los resultados* vs. *se discutió los resultados*). Based on the analysis of spoken interview data gathered from existing corpora which reveal intra-speaker variation between the two patterns, Fernández-Serrano argues that no specific (subject) properties can be identified which could be responsible for lack of agreement and offers a syntactic analysis of the asymmetry that focuses on intraspeaker free variation. In this view, the alternation between the agreeing and non-agreeing pattern resides in syntax. Following previous approaches that account for different syntactic outcomes through the timing of syntactic operations and embracing the intuition that the specific feature configuration of SE blocks person agreement with the subject, Fernández-Serrano puts forward the proposal that Spanish SE acts as a blocking element; that is, in non-agreeing SE constructions the clitic functions as an intervener that obstructs subject-verb agreement. Free alternation with the agreeing pattern, on the other hand, is accounted for in terms of the relative timing of the operations AGREE and MOVE.

Two chapters examine the distribution and interpretation of the null-subject of non-finite (controlled) adjunct clauses. Katie VanDyne's "Object control into temporal adjuncts: the case of Spanish clitics," Chapter 6, provides original Spanish data that defy the long-established generalization that obligatory control into temporal adjuncts is limited exclusively to subject control. More precisely, her data evidence a notable contrast between full DP objects and clitic objects in postverbal position, on the one hand, and preverbal object clitics, on the other: *La policía los<sup>i</sup> está buscando despueés de PRO<sup>i</sup> robar un banco* vs. *\*La policiía está buscando-los<sup>i</sup> después de PRO<sup>i</sup> robar un banco*. While the former constructions obey the proven pattern of objects being unable to exert control into an adjunct, the latter allows for object clitics to do so optionally, that is, both subject and object control may obtain in this case. Adopting Landau's (2015) two-tiered theory of control, VanDyne accounts for the viability of either a subject or preverbal clitic controller attested in these new data by distinguishing between two positions available for the clitic within vP: If the clitic occupies Spec vP, it can be a controller by being the closest c-commanding DP to adjunct PRO; in contrast, if the clitic moves to an outer specifier, subject control obtains because the subject is closest to the adjunct. Chapter 7, "Overt vs. null subjects in infinitival constructions in Colombian Spanish," by Kryzzya Gómez, Maia Duguine and

Hamida Demirdache, centers on the licensing of overt preverbal subjects in infinitival adjunct clauses in Colombian Spanish; the authors focus specifically on three types of adjunct clauses (introduced by *a*, *para*, and *sin*), which demonstrate disparate patterns of exceptions to generalizations about null infinitival subjects and their interpretations. (cf., *Juan<sup>i</sup> sería feliz al éli/k/PROi/\*k /José<sup>k</sup> dejar la casa; Juan<sup>i</sup> se fue para éli/\*j/PROi/\*/\*María<sup>k</sup> estar feliz; María<sup>i</sup> dejó de trabajar sin ellai/k/proi/k/Rosa<sup>k</sup> decir nada*) As argued, *a-*infinitives and *para-*infinitives display diagnostics of obligatory control, whereas *sin-*infinitives are characterized as non-obligatory control. The authors advocate for a novel DP-ellipsis analysis of *sin*-infinitives and an Anaphor Generalization to account for the conflicting patterns of interpretation for null vs. overt PRO in *para*-infinitives.

Chapter 8, "Oblique DOM and co-occurrence restrictions: How many types?" by Mónica Alexandrina Irimia presents a methodical examination of co-occurrence restrictions with oblique differential object marking (DOM) in standard and *leísta* Spanish and Romanian. The author carefully surveys the data, identifying six related puzzles in the differential behavior of oblique DOM clitics vs. full DPs and the lack of systematicity of available repair strategies. For instance: Why does Spanish oblique DOM DP produce a PCC effect with an indirect object that is doubled by a dative clitic, as shown in *Le enviaron (\*a) todos los enfermos a la doctora?* And why does this same restriction obtain in Romanian when an oblique DOM DP binds into a dative clitic doubled indirect object? (e.g., *Comisa le-a repartizat (\*pe) mai mulţi<sup>i</sup> medici rezidenţi unor foşti profesori de-ai lor<sup>i</sup>* ). Analyzing the rich and complex set of data, Irimia puts forth a cogent analysis that refines antecedent accounts based on the split Agree/Case, indicating the importance of the domain in which relevant feature ([person]) is licensed. And Chapter 9, by Nicoletta Loccioni, "A superlative challenge for a syntactic account of connectivity sentences," offers a careful examination of Iberico-Romance specificational sentences and contrasts such Italian, "*[L]a persona con cui Maria è più esigente è se stessa vs. La persona con cui ogni paziente è più onesto è il suo terapista.* Such data are of interest because they are known to exhibit connectivity effects with respect to Binding Theory; in addition, relativization in the specificational subject is required for the superlative reading of *più*. In accounting for these facts, researchers have followed syntactic and semantic lines of analysis. Loccioni focuses on one syntactic account – *Question + Deletion*, put forth by Schlenker (2003) and Romero (2007, 2018) – highlighting the challenges presented when relative clauses are specificational subjects. While Loccioni does not articulate a solution to salvage the syntactic account, the author does outline the desiderata for a solution.

Several contributions to the volume investigate the intersection of phonetics and social identity in French. In Chapter 10, "Revisiting sociophonetic competence: Variable spectral moments in phrase-final fricative epithesis for L1 & L2 speakers of French," Amanda Dalola and Keiko Bridwell analyze the spectral qualities of fricative epithesis, or the fricative-like noise produced by French speakers at the ends of breath groups as vowels become progressively devoiced. This has frequently been referred to in the literature as final vowel devoicing. The authors argue, however, that the production of the fricative-like element is, in fact, effortful and not the result of the kind of phrase-final devoicing that occurs with attenuating energy. The participants for their controlled reading task included two groups, 40 L1 (French) speakers and 31 advanced L2 (French−English) speakers, to ascertain whether they produced fricative epithesis differently as measured by center of gravity, standard deviation, kurtosis, skewness, and intensity. The results showed an interaction between speaker group and vowel for skewness with differences noted in the production of fricative epithesis following the vowel /y/ and in other measures as the vowel attenuated. The authors speculate that the /y/ vowel in French has social significance and is subject to hyperarticulation by L1 and L2 speakers alike, albeit to different degrees. Chapter 11, Hilary Walton's, "Does social identity play a role in the L2 acquisition of French intonation? Preliminary data from Canadian French-as-a-second-language classroom learners," examines how social identity affects the performance of individual L2 learners in two different language learning contexts. These contexts are identified as "French immersion", where anglophone learners complete a mandatory number of hours across their curriculum from Grades 1 through 12 in the French language, and "core French", where study in French is optional after Grade 4. While the results show that the immersion students self-report greater levels of in-group psychological attachment than their peers in core French, there were no statistically significant between-group differences in the linguistic dimension under study, pitch contours in non-final accentual phrases. Nonetheless, the study paves the way forward for future investigations of in-group linguistic accommodation within the context of L2 speech.

Bilingual and contact production are also at the center of several other chapters. Inspired by the lack of research on Catalan-influenced Spanish relative to the attention paid to Spanish-inflected Catalan, Annie Helms' Chapter 12, "Sociophonetic analysis of mid front vowel production in Barcelona," targets the contribution of age and gender to variation in the production of Spanish /e/ among Catalan–Spanish bilinguals. Her production study focuses on the production of Spanish /e/ in words that are cognate with Catalan, as cognates are argued to promote cross-linguistic interaction which, in turn, leads to the assimilation of

similar phonological categories between languages. Measures of the first two formants of Spanish and Catalan /e/ productions were extracted from the productions of 17 bilingual speakers. Her findings show significant evidence of the production of a Catalan-like /e/ in Spanish, especially among young males, but no effect of cognate status. Helms explains these results in terms of an overall weakening of the Catalan front vowel contrast in Barcelona coupled with greater overall variability in the production of front vowels in Spanish and Catalan among younger speakers. In Chapter 13, "Prosodic correlates of mirative and new information focus in Spanish wh-in-situ questions," Carolina González and Lara Reglero empirically investigate the pragmatic meaning of two types of wh insitu questions among 22 Spanish speakers in the Basque Country of Spain. Their participants completed a contextualized elicitation task answering prompts designed to motivate either in-situ information seeking or echo surprise questions as the response. Their goal in gathering this data was to determine whether the prosodic correlates of the productions of their participants are compatible with mirative or with new information focus. The authors investigate the tonal shape of the question, as well as the configuration of its nuclear peak, the height of the peak in Hz and the range of the focus tone. Their findings indicate that focal tone range delineates the two types of in-situ questions with echo surprise questions showing an expanded tonal range relative to information seeking questions. They conclude that surprise questions show mirative focus.

In "Mechanical vs. functional processes in subject pronoun expression in Spanish second language learners," Chapter 14, Ana de Prada Pérez and Nick Feroce contribute to the growing body of literature on Spanish referential pronouns, by examining their exponents among second language learners in comparison to bilingual native speakers. Data collected from sociolinguistic interviews shows that the learner groups produce more overt pronominal subjects than the native speaker group – the low learner group in both 2sg and 3sg, and the high learner group in 3sg only. The variationist analysis returned differences between both learner groups and native speakers in sensitivity to Switch Reference, though in 1sg only. Perseveration was evidenced for 1sg, across all participant groups, but only the native speaker group showed the effect for 3sg. Finally, the interaction of perseveration and switch reference was similar across groups. The authors interpret the data as indicating that differences between learners and native speakers is restricted to rate and function of pronominal expression and that these factors operate differently over deictic and referential subjects. These results align with extant studies while contesting others; for the latter, the authors are meticulous in pursuing explanations in coding and analysis.

The final chapter considers a universal property of language – Zipf's law, which holds that there is a correlation between a word's frequency and its length – extending its scope. More specifically, Chapter 15, "Frequency and efficiency in Spanish proverbs" by Ernesto R. Gutiérrez Topete examines whether/to what extent Zipf's law also applies to proverbs (e.g., *Lo que mal comienza mal termina*; *Más vale pájaro en mano que cientos volando*). The author scrutinizes 30 proverbs drawn a collection of proverbs frequently attested in the press in Tucumán, Argentina, and evaluates their occurrence in a news media corpus collected from *News on the Web* (NOW) included in the online *Corpus del Español* (Davies 2018). The results of the study show a positive correlation between the length and the frequency of a proverb: Proverbs displaying lower frequency rate are more resistant to shortening than proverbs with higher frequency rate. However, some outliers are found, indicating that in addition to frequency other factors are at work. On the other hand, neither syntactic complexity nor variability appear to play a role in the proverb's shortening rates.

#### **5 Acknowledgements**

We are very grateful to Mara Breen and the conference co-organizers of CUNY 2020 for leaving a detailed roadmap for hosting a conference online.

We are especially grateful to our former and current students who helped us at each stage of this adventure, many of whom we have already named. We thank Tracey Adams, Aris Clemons, Luis Avilés-González, Joshua Frank, and Anna Lawrence (profusely) for their assistance during the conference, particularly for their deft and undaunted mastery of digital environments. Other graduate students assisted in the planning phase of our "live" conference and we are grateful to them for their help: Salvatore Callesano, Victor Garre León, Tom Leslie, Amalia Merino, Marylise Rialliard, and Lamia Trifi. Special thanks go to Amanda Dalola, our tireless social media diva, for her awesome handling of the @LSRL50 Twitter handle. We thank the remarkable undergraduate student, Carol Zeng, for her insightful editing of these proceedings papers. For his LaTex typesetting magic, we express our appreciation to Luis Avilés-González. Finally, we thank our conference participants for their patience, enthusiasm, and good humor during a stressful moment of history.

We express our gratitude to our keynote speakers and workshop conveners whose sessions drew the conference's largest audiences. And we thank our chairs for their fearlessness in taking on the task of emceeing a session in the age of Zoom: Richard Meier, Randall Gess, Bruno Estigarribia, Karen Zagona, Anna Maria DiSciullo, Silvina Montrul, Adrián Riccelli, Michael Newman, Teresa Satterfield, John MacDonald, Jose Camacho, Laura Colantoni, Tim Gupton, Beth MacLeod, Julie Auger, and Bradley Hoot.

We would like to acknowledge our gratitude to those who agreed to undertake abstract reviewers for LSRL 50; they were completed with alacrity and thoughtfulness: Evangelia Adamou, Lourdes Aguilar, Gabriela Alboiu, Scott Alvord, Patricia Amaral, Mark Amengual, Raul Aranovich, Megan Armstrong, Karlos Arregi, Deborah Arteaga, Angeliki Athanasopoulos, Julie Auger, Jennifer Austin, Marc Authier, Laura Bafile, Brandon Baird, Aurora Bel, Judy Bernstein, Hélene Blondeau, Eulàlia Bonet, Travis Bradley, Barbara Bullock, Monica Cabrera, Andrea Calabrese, José Camacho, Richard Cameron, Rebeka Campos-Astorkiza, Anna Cardinaletti, Ana Carvalho, Isabelle Charnavel, Ioana Chitoran, J. Clancy Clements, Laura Colantoni, Sonia Colina, Marie-Hélène Côté, Maria Cristina Cuervo, Sonia Cyrino, Roberta D'Alessandro, Amanda Dalola, Justin Davidson, Laurent Dekydtspotter, Viviane Déprez, Anne Marie Di Sciullo, Manuel Díaz-Campos, Bryan Donaldson, Paola Giuli Dussias, Gorka Elordieta, Anna María Escobar, M. Teresa Espinal, Bruno Estigarrabia, Antonio Fábregas, Timothy Face, Zsuzsanna Fagyal, Anamaria Falaus, Raquel Fernández Fuertes, Olga Fernández Soriano, Franck Floricic, Jon Franco, Angel Gallego, Charlotte Galves, Anna Gavarró, Randall Gess, Alessandra Giorgi, Ion Giurgea, Carolina González, Grant Goodall, Alex Grosu, Tim Gupton, Julia Herschensohn, Virginia Hill, Chad Howe, José Ignacio Hualde, Haike Jacobs, Mary Kato, Carol Klee, Karen Lahousse, Manuel Leonetti, Juana Liceras, John Lipski, Conxita Lleó, Ruth Lopes, Luis López, Jonathan MacDonald, Bethany MacLeod, Rita Manzini, Fernando Martínez-Gil, Ana-Maria Martins, Diane Massam, Jaume Mateu, Eric Mathieu, Natalia Mazzaro, Egle Mocciaro, Fabio Montermini, Jean Pierre Montreuil, Silvina Montrul, Francisco Moreno Fernández, Michael Newman, Jairo Nunes, Rafael Nuñez-Cedeño, Antxon Olarrea, Dan Olson, Francisco Ordóñez, Sandra Paoli, Diego Pescarini, Pierre Pica, Acrisio Pires, Cecilia Poletto, Pilar Prieto, Elissa Putska, Michael Ramsammy, Rajiv Rao, Lisa Reed, Lara Reglero, Peggy Renwickl, Lori Repetti, Gemma Rigau Oliver, Yves Roberge, Ian Roberts, Francesc Roca, Marcos Rohena-Madrazo, Rebecca Ronquest, Johan Rooryck, Edward Rubin, Cinzia Russi, Andrés Saab, Nuria Sagarra, Mario Saltarelli, Liliana Sánchez, Teresa Satterfield, Leonardo Savoia, Cristina Schmitt, Sandro Sessarego, Miguel Simonet, Petra Sleeman, Jason Smith, Carmen Dobrovie-Sorin, Lauru Spinu, Dominique Sportiche, Jeffrey Steele, Jacqueline Toribio, Annie Tremblay, Mireille Tremblay, Michelle Troberg, Myriam Uribe-Etxebarria, Elena Valenzuela, Barbara Vance, Ioana Vasilescu, Julio Villa-Garcia, Anne, José Villeneuve, Irene Vogel, Lydia White, Erik Willis, Caroline Wiltshire, Malcah Yaeger-Dror, and Karen Zagona.

A special word of gratitude is owed to the colleagues who shared of their time and expertise in reviewing papers submitted for these referred proceedings: Lourdes Aguilar Cuevas, Alex Alsina, Mark Amengual, Richard Cameron, Justin Davidson, Carmen Dobrovie-Sorin, Bryan Donaldson, Antonio Fábregas, Franck Floricic, Joshua Frank, Ángel Gallego, Randall Gess, Joshua M. Griffiths, Alex Grosu, Julia Herschensohn, José Ignacio Hualde, Carol Klee, Luis López-Carretero, Bethany MacLeod, Jonathan MacDonald, Ana Maria Martins, Natalia Mazzaro, Egle Mocciaro, Jairo Nunes, Rafael Orosco, Luis Ortiz, Sandra Paoli, Rajiv Rao, Ian Roberts, Teresa Satterfield, Laura Spinu, Andrés Saab, Jacqueline Serigos, and others who wished to remain anonymous.

In closing, we wish to thank our sponsors for LSRL 50: The National Science Foundation and, from the University of Texas at Austin: The College of Liberal Arts, The Center for European Studies, The Department of Linguistics, The Department of French and Italian, The Department of Spanish and Portuguese, and The Department of Germanic Studies.

#### **References**


## **Chapter 1**

## **For an integrative approach to variation and change in French nasal vowel systems**

### Zsuzsanna Fagyal<sup>a</sup>

<sup>a</sup>University of Illinois Urbana-Champaign

The complex dynamics driving the evolution of phonemic nasal vowels have long puzzled linguistic historians. This paper, focusing on French, discusses some of this rich complexity through a glimpse at internal (linguistic) and external (social) dynamics that led to the development of multiple nasal vowel systems. It suggests that parallel findings from acoustic and articulatory phonetics (production), psycholinguistics (perception), and computational modeling could be aligned to shed new light on some of the mechanisms behind well-attested historical and ongoing changes. Comparative historical and experimental data, coupled with computational modeling, could provide new ways of approaching the evolution of nasal vowel systems in Romance.

### **1 Introduction**

Vowel nasalization has had a widespread impact on the phonological systems of Romance languages and, in a few cases, led to lexically distinctive nasal vowel phonemes. While the acoustic and articulatory properties of nasalization exhibit universal properties, the same cannot be said about the evolution of nasal vowel phonologies. In fact, even nasal vowel systems within the same Romance language show such degrees of variation that it would be difficult to offer generalizations of common trajectories and pathways of evolution.

And yet, for many decades, "French [has been] widely believed to provide the classic example of the nature and characteristics of vowel nasalization" (Sampson

1999: 1). In his Preface of *Nasal Vowel Evolution in Romance*, Sampson states that one of his main motivations for writing his treatise was to inspire others to move beyond French as the purportedly universal key to nasalization in Romance:

/…/ it is hoped that the present work will prove of some interest to general phonologists who wish to know more of the diversity and complexity of nasal vowel evolution in Romance beyond the familiar developments found in French (Sampson 1999: v).

He argued that the persistent myth of standard French as the measuring stick of nasal vowel evolution is an artifact due to the unacknowledged bias towards a culturally prestigious Romance language. Upon closer examination, he suggested that many developments in French taken as "general guiding principles of change" have arisen from "exceptional circumstances" motivated by social factors.

In her *Linguistic Change in French*, Posner also reiterates the prevailing convention of the universality of nasal vowel evolution in French. Similar to Sampson, she points to the importance of standardization and focuses her critical analysis on earlier claims of vowel height hierarchy:

[…] The lowering of nasal vowels in standard French has led some theorists to postulate that there is a universal tendency for nasality to sit better on low vowels. [But] the long-held doctrine of nasalization *par étapes*, with low vowels succumbing earlier than high vowels no longer gets support from the French philological evidence. (Posner 1997: 235–236).

In this paper, I argue in support of social motivations behind some of these changes in the nasal vowel system of so-called standard French. Based on a short review of historical evidence and contemporary studies of dialectal variation, I suggest that the lowering of the nasal vowel system appears to be the outcome of large-scale accommodation to emerging pronunciation norms in the variety of French selected as the language of the administration by the French kings starting from the 16th century. In light of other varieties that do not exhibit such realizations, the lowering of nasal vowels, indeed, appears exceptional rather than universal. If this accommodation hypothesis is true, then patterns of nasalization known from French, the longest-serving prestige lingua franca in Europe until the 19th century (Wright 2004), have been very likely reinterpreted as universal and subsequently attributed to other Romance languages.

However, the many ways in which social evaluation could have influenced the selection of norm-conforming nasal pronunciations among educated native speakers over time would be impossible to prove by controlled experiments and direct observations. Therefore, I suggest turning to indirect evidence from two dialects of French that reveal possible articulatory strategies whose acoustic effects could have been perceived to match the desired outcomes of prestigious pronunciation. In the last section of the paper, I discuss the implications of this type of "integrative approach" for future studies.

### **2 The historical record**

The processes that led to phonemic nasalization in French are generally thought to be well-understood and indeed largely considered universal. They are typically explained in three steps:


The long vowel /i/ in the Latin *vinum* 'wine', as shown in (1), first nasalizes to /vĩ/ and then, after losing its conditioning nasal consonant in coda position, lowers to a more open nasal vowel /ɛ̃/, one of the four lexically distinctive nasal phonemes attested in modern French. The Latin long vowel in the first syllable of the word *plenum* 'plenty', shown in (2), first diphthongizes to /plẽĩn/ and, in so-called standard French, lowers and eventually becomes monophthongal (/ɛ̃/) in *plein* 'full'. Like most monosyllabic words with a word-final nasal vowel phoneme, *vin* 'wine' and *plein* 'plenty' are lexically distinctive and stand in phonemic oppositions, among others, with *vent* /vɑ̃/, (ils) *vont* /vɔ̃/, *plan* /plɑ̃/, and *plomb* /plɔ̃/.


As far as vowel inventories are concerned, the rich allophonic nasal system of Early Old French (Figure 1), characterized by widespread contextual nasalization, became reduced to fewer allophones and, in some dialects, to monophthongs

#### Zsuzsanna Fagyal

during the Middle French period. By the end of the 16th century, nasal consonants following tautosyllabic nasal vowels sounded less prevalent (*amuïssement*), which made some historians speculate on their fusion with the preceding vowel, yielding combined perceptual effects of lingual and nasal articulations (Morin 1994: 34).

Figure 1: Early Old French (langues d'Oïl, tenth-eleventh centuries) exhibited many nasal allophones in the vowel system.

Although the pronunciation of nasals in Oïl varieties remained variable and, in the same time, quite specific to certain dialect areas, educated Parisian speakers were beginning to solidify their evaluations of what constituted socially desirable pronunciation in the newly forming administrative standard. Tendencies pointed to a general simplification of the nasal vowel system, albeit not without lengthy competitions and memorable debates that are well documented in the French grammarian tradition.

For instance, less than a decade after grammarians declared that the confusion between *in* and *en* should be considered "a vice of provincial influence" (D'Aisy 1685), the merger of the two unrounded front vowels /ẽ/ and /ɛ̃/ was considered unremarkable even by authoritative commentators such as Abbé Dangeau (1694: 76) who declared no longer hearing any difference between nasal vowels in *certain* 'certain', *dessein* 'purpose', and *divin* 'divine'. The high back vowel series underwent the same evolution. 16th-century grammarians were still condemning the pronunciation of 'ouïsites', people who pronounced the vowel "o" like "u" in words such as*chose* 'thing', as these speakers also tended to have a less open, u-like pronunciation of "o" before nasals in words such as *dunc* 'so' and *meuchiseu* 'Sir'. However, towards the mid-17th century, a more open and centralized realization of the back nasal in "oN" letter sequences won out in nearly all lexical classes in the emerging standard language and the more closed, u-like realizations of the vowel became associated with the lower classes:

The opposition of the learned classes and the grammarians to the pronunciation of *oN* in *ou* grows in the seventeenth century and denasalized *o*-s pronounced *ou* are denounced along with pure 'ouïsmes' (Ruhlen 1979: 334).

Figure 2: Schematized depiction of the general lowering of the standard French vowel space resulting in a four-way system.

The resulting subsystem of nasal vowels in French selected as the future administrative standard comprised only a few nasal vowels (Figure 2), and nearly all of them preferably pronounced as monophthongal. Occasional suggestions of a high nasal vowel (/ĩ/) continued to surface, "confined to the prefix *in-*/*im-* in learned words" (Sampson 1999: 82), but its use was attributed to learned styles of speaking rather than emerging phonological contrasts. The well-known fourway system (Figure 2), still the norm in conservative standard usage and orthoepic conventions today, became established and reinforced by increasingly unified typographic conventions in the following centuries.

It must be pointed out that the precise quality of these vowels is difficult to reconstruct. One reason is the nature of philological evidence that is necessarily filtered through typographic choices and spelling conventions that tend to obscure representations of variation in vowel height. When discussing the transcription of nasal vowels, for instance, Hansen (1998: 67)) refers to "the graphic *definition* of nasal vowel phonemes" (emphasis from me). Others note that discussions of the underlying representations of nasal vowels are always presented in terms of letter sequences (V, VN), contributing to simplified interpretations of the physiology of nasalization (Posner 1997: 236). Thus, it is likely that, starting from the early modern era, heavy orthographic bias in the representation of nasal vowels obscured some of their unique dialect-specific qualities.

In the modern era, the historically attested general lowering of the nasal vowel system of French spoken in and around Paris continued. The tendency of the front nasal /ɛ̃/ to both lower and centralize to /ɑ̃/ even in word pairs where lexical distinctions could be jeopardized (Fónagy 1989), such as in examples (3) through (5) :

(3) a. C'est intérieur. 'This is interior.'

	- b. C'était aux Andes. '*This* was in the Andes.'
	- b. Quel beau temps! 'What a nice weather!'

While such observations have been made in earlier studies, Fónagy insisted on their increasing frequency, coupled with the tendency of the open nasal /ɑ̃/ to also increasingly sound like the back nasal /ɔ̃/. His tentative conclusion based on his own impressionistic observations indicated ongoing mergers (*neutralisation*) between these vowel pairs, particularly in younger speakers' speech:

L'expérience quotidienne indique que les phénomènes de neutralisation sont beaucoup plus fréquents dans la parole des jeunes Parisiens et Parisiennes que dans les groupes d'âge de 50-60 ou 60-70 ans (Fónagy 1989: 232). 'Everyday experience indicates that the neutralization phenomena is much more frequent in the speech of young Parisians than in age groups of 50–60 and 60–70.'

Although Fónagy argues that the perceptual overlap of these vowels rarely, if ever, endangers "the correct transmission of the message" (Fónagy 1989: 228) thanks to the context that can disambiguate between competing lexical meanings, he does not even evoke the interpretation of "chain shift" as an alternative explanation to vowel merger. And yet, the idea of a "counterclockwise push shift" cannot be excluded in light of experimental evidence (Malécot & Lindsay 1976), showing that, following World War II, the rounded front nasal vowel /œ̃/ merged with the front nasal vowel /ɛ̃/, which could have set up a chain shift "pushing" /ɛ̃/ to lower and centralize, i.e., impede on the vowel space of /ɑ̃/ and, in order to preserve its distinctiveness, causing /ɑ̃/ to sound more like /õ/ in some contexts.

Exceptions to these patterns, however, are numerous in regional varieties of French in France and Quebec. Gendron's (1966) classic description of the front nasal vowel /ɛ̃/ as particularly fronted and close to /ẽ/ in Quebec, and Walker's (1984: 81) observations that "the phonetic realizations of the four nasalized vowels are significantly distinct from those of SF (standard French)" resonated well with historians. Posner, for instance, noted that "[lowering] has not operated in many varieties of French, especially in northern France and Canada, where the tendency …is to front /ɑ̃/ to /ã/ and /ɛ̃/ to/ ẽ/" (Posner 1997: 235).

These speculations led to some tangible hypotheses about the production of these differences in vowel quality, which will be explored in the next section.

#### **3 Production**

Contrary to received wisdom that nasality arises solely from the lowering of the velum, the variability of nasal vowels attested in the historical record comes from the acoustic effects of three articulatory sources:


These gestures are combined in idiosyncratic, speaker-dependent strategies that can be equally good ways of pronouncing target-like nasal vowels. When combined successfully to achieve this effect, the goal of articulatory strategies is to enhance the acoustic effects of nasalization, i.e., maintain and even reinforce the clarity and distinctiveness of each nasal phoneme.

In a combined acoustic and articulatory study involving Northern Metropolitan French (NMF) and Quebec French (QF) speakers, Carignan (2013) looked at velic, lingual, and labial articulatory patterns of three of the four nasal vowels in comparison with their oral counterparts. He showed that in the majority of his NMF speakers' speech, the front nasal vowel /ɛ̃/, such as in *pain* 'bread', had a relatively high first formant (F1) and low second formant (F2), which came to overlap with the acoustic space occupied by oral /a/. He therefore suggested alternative phonetic transcriptions for the three nasal vowels (Figure 3).

Furthermore, one would expect /ɑ̃/, like in *paon* 'peacock' to occupy an acoustic space near its oral counterpart /a/, but nasal /ɑ̃/ was realized instead with a lower F1 and F2, bringing nasal vowel /ɑ̃/ near the acoustic space occupied by

Figure 4: Schematic representation of NMF participants' idealized nasal vowel space based on Carignan (2013)

oral /o/ (Figure 4). For the majority of the NMF speakers, the back nasal vowel /õ/, such as in the word *pont* 'bridge', was also realized with a relatively low F1 and F2 compared to /o/, which brought /õ/ near the acoustic space occupied by the high back vowel /u/. These results showed acoustic and articulatory evidence that nasal vowels in this variety continue their historical path of lowering and are possibly participating in a chain-shift.

Results were more tentative on the acoustic effects of lingual articulations in the Quebec French (QF) variety, featuring speakers from the rural Saguenay region of Quebec (Figure 5). They showed that the back nasal vowel /õ/ is lowered toward the acoustic space of the open nasal /ɑ̃/, but that /ɑ̃/ was not systematically fronted and raised towards the front nasal, as a clockwise chain shift would predict (Figure 6). The dynamics of the realizations of the front nasal vowel /ɛ̃/ also went in the expected direction of a clockwise shift, but in terms of lingual articulations, the nucleus of the vowel was not different from the front oral vowel /ɛ/. Thus, a clockwise chain shift in the QF dialect could not be robustly confirmed on acoustic and articulatory grounds. These findings do lend support to those "considerable differences" between the two vowel systems, signaled by Walker (see above).

Figure 5: Average formant frequencies of QF speakers participating in Carignan's (2013) experiment (Nicholas et al. 2019: 1209)

These variations in nasal vowel quality are consistent with, what is typically called, formant-frequency-related acoustic effects of nasalization on the vowel space, which means that acoustic output from nasalization is attributed to the movements of the tongue and the lips. Thus, diachronically, it is likely that as

Figure 6: Schematic representation of QF participants' idealized nasal vowel space based on Carignan (2013)

subsequent generations of native speakers perceived the effects of nasalization in the newly emerging standard variety (now NMF), they reanalyzed its sources as lingual and labial articulations and did their best to imitate what they heard. In this way, one might speculate, the lowering and centralizing acoustic effect of nasalization – which is indeed one of the universal properties of velo-pharyngeal coupling – became phonologized in the language over time.

Next, we will turn to the production-perception interface in two varieties of French to understand how dialect-specific acoustic variations can inform sameand cross-dialectal perception of nasal vowels. The results can help us make better predictions of listeners' interpretations of simultaneous acoustic cues of different articulatory origins and, ultimately, their accommodations to such patterns in their own speech.

#### **4 Perception**

In a series of studies, Nicholas (2018), Nicholas et al. (2019) and Nicholas & Fagyal (In review)) tackled the production-perception interface of nasal vowel realizations in two varieties of French. Some of their research questions were:

• **Q1:** When presented to native listeners of French, would phonetic realizations of front (/ɛ̃/) and open (/ɑ̃/) nasals in Northern Metropolitan French

#### 1 Integrative approach to variation and change in French nasal vowel systems

and of open (/ɑ̃/) and back /ɔ̃/ nasals in Quebec French be identified with less accuracy, possibly because they impede on each other's vowel spaces in their respective counterclockwise push (NMF) and clockwise pull (QF) shifts?

Our expected answer was "yes, they would" since these vowels are at the onset of each hypothesized nasal vowel shifts, respectively, and therefore should show greater formant overlap.

• **Q2:** Would less familiarity with each dialect result in greater difficulty distinguishing nasal vowel contrasts cross-dialectally? And does expertise with the sounds of the language – teaching them and researching them – help distinguish especially difficult nasal vowel contrasts?

Again, our expected answers were "yes" since nasal vowels recorded in a less widely diffused rural variety of French in Quebec could be expected to be more challenging to identify cross-dialectally. Also, expertise – defined as frequent and in-depth exposure to the sounds of the language – could facilitate the accuracy of perception.

• **Q3:** Does greater vowel duration help the accuracy of nasal vowel identifications in Quebec French where progressive increase in the degree of nasalization has been attested in previous studies?

Our hypothesis was that increased vowel duration in Quebec French should improve the accuracy of nasal vowel perception in both dialects.

#### **4.1 Experiment**

We used stimuli from Carignan's (2013) production study targeting the two dialects: a lowered and retracted vowel space showing the three target words in Northern Metropolitan French (Figure 3 and 4, above), and the same target words in a raised and fronted vowel space in Quebec French (Figure 5 and 6, above).

Seventy listeners took part in a computerized forced-choice gating experiment constructed in E-Prime 2.0 ran by Nicholas (2018). The 19 women and 12 men who came from the Saguenay-Lac-Saint-Jean dialect area in Quebec and the 20 women and 19 men from the greater Paris area in France were divided into four age and occupational groups based on age categories taken from the 2017 Canadian Census.

The 72 target words were selected from Carignan's (2013) corpus. Monosyllabic real words with a word-initial /p/ and /t/ were followed by one of the three nasal vowels. The six distractor words contained the oral counterparts of each nasal vowel. All words were nouns in order to control for morpho-syntactic category. There were three different repetitions per word per speaker in two separate sessions with a break in-between.

The target words were presented using the gating paradigm. Listeners saw the question "Which word do you hear?" written on the computer screen in French and were asked to choose the number key on the keyboard that corresponded to their answer: 1 or 2. At the first gate, they heard the first half of the vowel, at the second gate, they heard the full vowel, and at the third gate, they heard the full word that included the onset consonant. Listeners could only hear each gate once. They had to click the space bar to proceed to the next gate or to proceed to the next word pair showed on a separate screen and they were not told until after the experiment that they would be hearing two different dialects.

The order of the words on the screen and of the appearance of target words and distractors were randomized and counterbalanced across six experiment lists. We used the lme4 package in R with *participant* as random intercept and *age*, *gender*, *dialect*, *target-competitor vowel pair*, and the interaction between *dialect* and *vowel pair* as fixed effects. To determine the significance of fixed effects, we used the mixed function from the *afex* package.

#### **4.2 Results**

The effects of *dialect*, *target-competitor vowel pair*, and the interaction of *dialect* and *target-competitor vowel pair* on accuracy of vowel contrasts were significant. The effect of age on accuracy varied depending on the dialect and was not as robust for native dialect identification in the NMF as in the QF group. Gender was not significant.

As hypothesized, in NMF stimuli heard by NMF listeners, /ɛ̃/ was often mistaken for /ɑ̃/, but the opposite was not true: /ɑ̃/ was never mistaken for /ɛ̃/ (Figure 7). Also, /ɑ̃/ was frequently misidentified as /ɔ̃/, while /ɔ̃/ was always accurately identified in contrast with /ɑ̃/. Cross-dialectally (Figure 8), there was confusion when /ɑ̃/ was contrasted with /ɛ̃/and when /ɔ̃/ was contrasted with /ɑ̃/, while identifications in the opposite directions were much more accurate: /ɛ̃/ was identified nearly categorically when contrasted with /ɑ̃/ and /ɑ̃/was distinguished from /ɔ̃/.

When QF listeners heard target words in their own dialect (Figure 9), they found /ɑ̃/ difficult to distinguish from /ɛ̃/ but the opposite was not true because

Figure 7: Accuracy in the perceptual identifications of nasal vowel contrasts: NMF listeners listening to NMF nasal vowel contrasts. Red arrows indicate incorrect identifications; green arrows indicate correct identifications (based on Nicholas et al. 2019)

Figure 8: Accuracy in the perceptual identifications of nasal vowel contrasts: NMF listeners listening to QF nasal vowel contrasts. Red arrows indicate faulty identifications; green arrows indicate correct identifications (based on Nicholas et al. 2019)

#### Zsuzsanna Fagyal

/ɛ̃/was identified nearly categorically in contrast with /ɑ̃/. Similarly, supporting the interpretation of a possible clockwise shift, /ɔ̃/ was often misheard for /ɛ̃/, but not the other way around, as /ɛ̃/ was heard as nearly categorically distinct from /ɔ̃/.

Figure 9: Accuracy in the perceptual QF of nasal vowel contrasts: QF listeners listening to QF nasal vowel contrasts. Red arrows indicate faulty identifications; green arrows indicate correct identifications (based on Nicholas et al. 2019)

When QF listeners heard NMF stimuli (Figure 10), vowels adjacent to each other along the peripheral tract of the vowel space, yet again, provoked confusion when contrasted with other vowels: /ɛ̃/ was often confused with /ɑ̃/ and /ɑ̃/ was often taken for /ɔ̃/. However, identification was nearly categorical when /ɑ̃/ was contrasted with /ɛ̃/, and /ɔ̃/ was contrasted with /ɑ̃/. As expected, there was more confusion cross-dialectally than within dialects, especially when NMF listeners listened to QF which, as mentioned above, might be due to unfamiliarity with the rural variety of QF used in the experiment. Also, as predicted, NMF listeners did benefit from longer stimuli in QF and showed higher accuracy. However, increased vowel durations proved less useful for QF listeners listening to NMF throughout the three gates.

In light of a forthcoming study where in-depth knowledge and professional work with the language was also considered (Nicholas & Fagyal In review), we can also confirm that expertise with the language matters: NMF "Experts" listening to NMF input showed significantly greater accuracy in identifying nasal vowel contrasts than their "non-Expert" counterparts (Figure 11a) when perceiving contrasts between the front /ɛ̃/ and the open /ɑ̃/ nasals and the open /ɑ̃/ and

Figure 10: Accuracy in the perceptual identifications of nasal vowel contrasts: QF listeners listening to NMF nasal vowel contrasts. Red arrows indicate faulty identifications; green arrows indicate correct identifications (based on Nicholas et al. 2019)

back nasals /ɔ̃/ that show overlap in the acoustic space due to the NMF counterclockwise shift. Notice that the listeners' difficulties, again, did not extend to any other contrasts; main difficulties in perception were limited to contrasts with the greatest variability, possibly due to ongoing change. The same held crossdialectally. Although both "Expert" and "non-Expert" QF participants performed below chance on stimuli from the NMF dialect (Figure 11a), "Experts" still performed better than "non-Experts". The same was true when "Expert" vs. "non-Expert" QF participants listened to QF input (Figure 11b): 'Experts' performed slightly better than "non-Experts" in the most difficult contrasts. When it comes to QF input heard by NMF "Expert" and "non-Expert" listeners, just like their Canadian counterparts hearing NMF stimuli, the "non-Experts" performed below 50% accuracy. This means that they were less than 50% sure what nasal vowel they heard in two of the most difficult vowel contrasts that are especially variable, possibly due to ongoing change.

Taken together, these results show that dialect-specific acoustic variations are challenging to perceive for all, but especially for non-native listeners of the dialect and with respect to the most variable stimuli. Prolonged exposure to the dialect, however, makes perception more reliable, even in the most variable contexts. These patterns can be predicted to shape listeners' own interpretations and, possibly, replication of these patterns in their own speech.

Figure 11: Perceptual accuracy among NMF and QF listeners listening to NMF input (top) and the same listeners listening to QF input (bottom). Figure reproduced with permission from Nicholas & Fagyal (In review)

1 Integrative approach to variation and change in French nasal vowel systems

#### **5 Modeling**

How to integrate the last piece of the puzzle – computer simulations of the actual accommodation processes – will have to be determined in the next few decades in computational sociolinguistics. What seems clear is that traditional experimenting and testing must be integrated into broader, multi-pronged approaches to social variation and change in historical times.

Agent-based computational models of vowel shifts have been proposed in the sociolinguistic literature since the early 2000s, with the intention of simulating the collective social dynamics between populations that come into contact with each other. Applied in particular to the Northern Cities Vowel Shift, Swarup & McCarthy's (2012) model, for instance, incorporated several empirically-derived rules of vowel change of the Northern Cities Shift together with psychological processes, such as representational momentum, accounting for the ways in which exemplars of various vowels are copied – imitated – by the simulated social agents. Sociolinguists and applied computer scientists at the University of Illinois have also investigated the role of centrally and periphery connected individuals in the diffusion of lexical innovations (Fagyal et al. 2010).

Such models could be used to simulate chain shifts in nasal vowel systems in large groups of agents that come in contact in specific ways over the course of a simulated history of events. Quite a lot is known about the long history of dialectal separation between French spoken in France and Quebec. Thanks to church records, demographic data of migrants and settlements are also abundant, which could help speculate on the social dynamics of separation and convergence between certain segments of French-speaking populations at a given time in history.

What is needed is information about the parameter space, the production and perception of the segments at play (e.g., nasal vowels) and ideas of how to model and interpret their interactions with social factors. Together, they could allow the simulation of population-level norms negotiated and adopted in the following simulation carried out for the study of Fagyal et al. (2010): https://nssac.bii. virginia.edu/~swarup/animations/degree\_biased\_voter\_model.mov. (The flickering red and blue dots indicate ongoing negotiations (one agent copying another one's usage of a variable), while uniform red and blue dots signal the stabilization of a norm (the adoption of an innovation) over another.

This integrative approach is deductive in nature. It starts out with broad generalizations of possible scenarios of accommodation at the population level, whose validity is tested using the computational modeling of social dynamics underlying the multiple instances of accommodation in production and perception between individuals of a large and evolving population. If successful, they could open a new chapter in studies of vowel evolution in Romance, as well.

#### **Acknowledgements**

I would like to thank Christopher Carignan, Jessica Nicholas, and Samarth Swarup for our long-term collaboration and co-authorships. Many thanks to Joseph Roy, Marissa Barlaz, and Gyula Zsombok for their work as data management and analysis consultants at the University of Illinois at Urbana-Champaign. My gratitude goes to colleagues at the Illinois Phonetics and Phonology Laboratory and the Phonetics Laboratories of the *Université de Québec à Chicoutimi* and *l'Université de Paris III*. I am also thankful for the National Science Foundation Grant 1121780 awarded to Ryan Shosted (PI), C. Carignan, Z. Fagyal (co-PI-s) between 2011 and 2013 that helped advance research on nasal vowels in French significantly.

### **References**


1 Integrative approach to variation and change in French nasal vowel systems


Posner, Rebecca. 1997. *Linguistic change in French*. Oxford: Clarendon Press.


## **Chapter 2**

## **Assessing change in a Gallo-Romance regional minority language: 1pl verbal morphology and referential restriction in Picard**

Julie Auger<sup>a</sup> & Anne-José Villeneuve<sup>b</sup>

<sup>a</sup>Université de Montréal <sup>b</sup>University of Alberta

This paper examines the possible change from 1pl to 3sg forms when referring to a group that includes the speaker in Picard, a Gallo-Romance language of Northern France. Using older and contemporary Picard written data, as well as contemporary oral data, we show that, even though Picard and colloquial French use the two forms, the two languages differ. Contrary to colloquial French, where 1pl usage has become marginal, 1pl remains widely used in Picard. Our analysis of semantic reference (restricted, specific unrestricted, or general unrestricted group) indicates that 3sg is primarily associated with general unrestricted reference in Picard and is barely used to refer to restricted groups. Most interestingly, the relative frequency of the two variants remains stable over time. Our analysis demonstrates the importance of considering linguistic conditioning through the comparative method for assessing language change in typologically related varieties, especially when testing claims that a minority language is converging toward its dominant counterpart.

### **1 Introduction**

The debate over whether a given linguistic variety constitutes an autonomous language or a dialect of another variety is typically of little interest to formal linguists; what matters to us is that the system analyzed is coherent and that its linguistic forms are generated by the same mental grammar. However, such

a question may have far-reaching consequences for endangered Romance languages, especially in the European sociopolitical context, as only languages that are "different from the official language(s) of that State" may be recognized and protected under the Charter for Regional and Minority Languages (Council of Europe 1992). Thus, while varieties like Catalan, Franco provençal and Occitan differ sufficiently from Spanish, Italian or French to unequivocally qualify for official recognition and support, regional varieties whose language-versus-dialect status is the object of debate do not benefit from the same protections. For instance, the Gallo-Romance varieties spoken in Northern France (e.g., Norman, Picard), although formally listed as "regional languages of France" in a 2013 report from the French Ministry of Culture and Communication (DGLFLF 2013), continue to be perceived by many in the greater public as "bad", "corrupt", or, more neutrally, regional varieties of the national language, Continental (or Hexagonal) French<sup>1</sup> (Éloy 1997a). Such a perception has contributed to the stigmatization and lack of transmission of these varieties, as well as to their continued exclusion from official school curricula, even though such an inclusion is allowed, for example, by the Deixonne law and the more recent Lang initiative (Éloy 1997b), and more generally, to the refusal to grant them official recognition and protection at the national and European levels. In these situations, comparative sociolinguistic research, through its careful examination of variation patterns that focus on both the distribution of variants and the linguistic conditioning behind variant selection, can be of service to language policy makers. Specifically, it can serve as a tool for assessing whether the linguistic distance between two closely related varieties may be sufficient to call them "different languages" such that they can be recognized and protected under the European Charter for Regional and Minority Languages.

The status of Picard, an endangered Gallo-Romance language of Northern France, is the object of considerable debate. While scholars recognize that the two varieties' phonology and lexicon differ considerably, Éloy (1997b: 137) argues that Picard's morphosyntax does not significantly differ from that of colloquial French. Evidence against Éloy's position is provided by detailed analyses of specific constructions. For example, Burnett & Auger (2018) have shown that negation in Picard is realized through two different elements, *point* and *mie*, and that the latter serves to negate presuppositions and express emphasis. Auger (2020) has shown that the Vimeu variety of Picard possesses two different subject neuter clitics, *a* and *ch*, whose distribution depends on the type of predicate

<sup>1</sup>We use "Continental French" to refer to the variety of European French spoken in Continental France.

that they combine with. Thus, whereas French uses the same neuter pronoun, *ce* (and its colloquial variation, *ça*) with nominal and adjectival predicates (e.g., *C'est mon ami* 'it is my friend' and *C'est beau* 'it is beautiful'), Picard uses different pronouns: *Ch'est un gros férmieu* 'it is an important farmer' and *a n'est mie bieu* 'it is not beautiful'). For negation and neuter pronouns, the grammatical difference between Picard and (colloquial) French is clear. However, for the numerous morphosyntactic structures that are shared by the varieties, the difference is less clear. In these cases, we suspected that refusals to recognize the grammatical autonomy of Picard rely on superficial comparisons. In order to test this suspicion, we have carefully analyzed data collected from a bilingual Picard–French community of practice located in rural Picardie to determine how much Picard and French morphosyntax truly differ. This work has shown that shared morphosyntactic structures function differently in Picard and in colloquial French. This is the case, for example, for subject doubling and *ne* deletion: whereas the co-occurrence of subject doubling and *ne* presence is marginal in French, due to the opposite stylistic values of the two forms, this combination is the most commonly attested in Picard (Villeneuve & Auger 2013).

This paper examines first person plural verbal morphology (henceforth 1pl), a variable which, superficially, seems to support Éloy's convergence claim. As we see in Table 1, French makes use of 1pl and 3sg indefinite pronouns to refer to a group that includes the speaker whereas Picard makes use of a homophonic pronoun that shares the same form for 1pl and indefinite 3sg reference: *oz*<sup>2</sup> . In the case of Picard, verbal morphology distinguishes the two persons: an *–ons* ending for 1pl in most tenses and the absence of overt marking for 3sg.

	- b. os one/we/you.PL cant-ons sing 'we sing'
	- c. os one/we/you.PL cant-eu sing 'you.pl sing'

The 1pl and 2pl pronouns result from the loss of the initial consonants in *nos* and *vos*, respectively. 3sg *os* results from the denasalization of *on* 'one' in unstressed position (Hrkal 1910: 260–261). All three pronouns are pronounced [o] before a consonant and [oz] before a vowel.

<sup>2</sup>*Os* is also used as a 2pl subject pronoun. Once again, verbal morphology distinguishes 2pl from 3sg.indefinite and 1pl, as we can see below::


Table 1: 1pl and 3sg verbs in French and in Picard

Previous variationist work has shown that the use of *nous* has become marginal in many varieties of colloquial French (e.g., 1.6% in Montréal, Laberge 1977: 132; 4.4% in Picardie, Coveney 2000: 466; see also King et al. 2011). To this day, no comparable analyses have been undertaken for Picard. Thus, in this paper, we seek to determine whether the replacement of 1pl by an indefinite 3sg pronoun is observable in Picard and to establish whether the constraints that favor the selection of person operate similarly to what has been described for colloquial French.

#### **2 1pl verbal morphology in French and Picard**

The variation between *nous* -*ons* and *on* + 3sg in French has received considerable attention from linguists and sociolinguists. Because, throughout much of the history of French, 1pl has involved the pronoun *nous* followed by a verb suffixed with -*ons*, we might think that the use of *on* with a 3sg verb form and the concomitant reduced occurrence of *nous* -*ons* reflects a gradual replacement of the latter form by the former. However, there are reasons to question such a scenario. Indeed, while *nous* as a subject pronoun is widely attested in written documentation produced by literate speakers of French ever since Old French<sup>3</sup> , some scholars have raised doubts concerning its use in the speech of lower-class speakers. Citing Coveney (2000) and Lodge (2004), King et al. (2011) invoke the widespread

<sup>3</sup>We thank Barbara Vance for confirming this information.

use of *je* -*ons* (cf. 1a) forms and the rarity of *nous* -*ons* forms (cf. 1b) in representations of lower-class speech from the 16th through 18th centuries. Thus, the question remains whether the near-categorical use of 3sg *on* in Québec (Laberge 1977), Picardie (Coveney 2000), and Switzerland (Fonseca-Greber & Waugh 2003) results from the replacement of *nous* by *on* or from the disappearance of the *je* -*ons* form. Additional support for the latter hypothesis is found in Flikeid & Péronnet's (1989) analysis of 1pl pronouns and verbs in *Atlas linguistique de la France*, which confirms the rarity of *nous* -*ons* forms in the northwestern parts of France, with the exception of the former Somme and Pas-de-Calais *départements*, where *os* -*ons* forms dominate.<sup>4</sup>

	- a. Moi me et and le the.M gros big Lucas, Lucas et and **je** I nous us **amus-i-ons** enjoy-PST-1pl à to bâtifoler fool.around avec with des some mottes clumps de of tarre dirt (*Don Juan*, Act II, scene 1, 1665)

'Me and big fat Lucas, and we were having fun fooling around with clumps of dirt'

b. qu'il the aille he au go diable at.the.M avec devil son with mulet! his ... mule! **nous** we **ir-ons** go.fut-1pl devant before les the.pl juges judges (*Les Fourberies de Scapin*, Act I, scene 1, 1671)

'he can go to hell with his mule! ... we shall go before a judge'

c. je I ne neg sais know pas not quand when **on** one **verra** will.see.3sg finir finish ce this.m galimatias gobbledygook (*Sganarelle*, scene 22, 1660)

'I don't know when we will see the end of such gobbledygook'

The prevalence of *os* –*ons* forms for 1pl is attested since at least Middle Picard for the western parts of the Picard-speaking area and the 17th century for the southern portions of the Picard-speaking area (Flutre 1970: 140, 147). Monographs from the turn of the 20th century, such as Edmont (1897/1980), Ledieu (1909/2003), and Hrkal (1910), provide support for the results from the *ALF*. Vasseur (1996) confirms the prevalence of the *os* -*ons* construction in Vimeu Picard, while Picard textbooks mention only this form for 1pl (Debrie 1983, Dawson & Smirnova 2020:

<sup>4</sup>Our consultation of all relevant ALF maps for 1pl on the Symila website (http://symila.univtlse2.fr/) confirms Flikeid & Péronnet's (1989) generalization based on 3 maps.

86). No description of Picard mentions the use of 3sg *os* as a competing form for 1pl inclusive reference.

As we have already mentioned, considerably more is known about 1pl variation in French than in Picard. Coveney (2000) and Fonseca-Greber & Waugh (2003) show that use of subject *nous* is very infrequent in Continental and Swiss French varieties. The results compiled from other studies by King et al. (2011: 501) reveal frequencies of *nous* varying between 0.25% and 2.6% in Québec and Ontario French. For Acadian French, their compilation indicates variation between *je* -*ons* and 3sg *on*, with no tokens of *nous*. As for Louisiana French, the pattern differs based on the location investigated: for the Cajun varieties of the coastal marshes, Rottet (2001: 197) reports the gradual loss of 3sg *on* to the profit of disjunctive *nous*-*autres*, while Dajko (2009: 148) observes an overwhelming preference for 3sg *on* in Lafourche Parish, along with very low frequencies for null pronouns, *nous*-*autres on*, and *nous*-*autres*.

The historical and variationist analyses of the variation between *nous* –*ons* and 3sg *on* also inform us on the factors that favor the two variants and, consequently, on the path taken by the grammaticalization process by which the latter replaces the former as 1pl. King et al. (2011) coded for the linguistic factors that influence the grammaticalization of *a gente* as a 1pl pronoun in Brazilian Portuguese (Zilles 2005), namely verb tense, verb class, clause type, and referential restriction. Of the four factors considered, only the last one, referential restriction, was found to play a significant role in their French data. Given that *on* has historically expressed indefinite reference, it is not surprising that its use is most strongly favored for unrestricted groups whose membership includes individuals who do not belong to a speaker's network.

Twentieth century varieties of Modern French fail to provide appropriate data for testing the grammaticalization process whereby subject *nous* gives way to *on*, either because *nous* is so marginal that a quantitative analysis is impossible or because the variation that persists involves different variants (*je* -*ons* vs. *on* in Acadian French; *nous* -*ons* vs. *on* in Continental, Swiss, and Québec French). Additionally, uncertainty remains concerning the use of *nous* -*ons* forms by lowerclass speakers in previous centuries, which makes it difficult to evaluate the factors that have influenced the rise of 3sg in French. Consequently, we believe that Picard provides the perfect testing ground for gaining a better understanding of the gradual replacement of 1pl forms by 3sg ones. Indeed, the frequent use of 1pl subject pronoun and verbal morphology that characterizes western dialects of Picard, along with the possibility of an increase in the use of 3sg variants as seen in our preliminary analyses, provides the type of data that will allow us to

determine the effect played by referential restriction (see below) on the choice between traditional 1pl *oz* -*ons* and innovative 3sg *oz*.

#### **3 Methodology**

Our recent work assesses the degree of structural morphosyntactic convergence and divergence between French and Picard by analyzing data from the Vimeu area, located in rural Picardie, France. In continuity with our previous work, we examine three types of data: contemporary oral data for French and Picard, contemporary written Picard, and older written Picard data from the 1940s to the 1960s. Our Vimeu Picard and French contemporary oral data are extracted from sociolinguistic interviews with four Picard–French bilingual men and supplemented by Vimeu French oral data from a control group of four French monolingual men (see Villeneuve & Auger 2013 for a detailed description); in this paper, we focus on the bilingual data described in Table 2.<sup>5</sup>


Table 2: Oral Picard and French corpus, bilingual speakers' demographic information (adapted from Villeneuve & Auger 2013: 119)

Because of the methodological challenge that the assessment of morphosyntactic variation in regional minority languages represents, due, for instance, to limited amounts of oral data on which to perform quantitative analyses (see Auger & Villeneuve 2017: 552), we compare our contemporary Picard oral data from bilinguals with contemporary and older written data from three Picard authors born between 1904 and 1959, as shown in Table 3.<sup>6</sup> Vasseur's and Dulphy's data come

<sup>5</sup>The absence of women in our corpus stems from the gender imbalance in the number of regional minority language speakers and in their daily use of the language (Pooley 2003). It is therefore difficult to find a reliable, balanced sample of female Picard speakers.

<sup>6</sup>Given that Picard is strongly associated with orality, it may seem somewhat ironic to seek linguistic data from written texts. However, thanks to the relatively large amount of such texts and to the fact that written Picard faithfully mirrors the spoken language (Auger 2002, 2003), we are confident that this approach can help us determine whether Picard 1pl verbal morphology is changing and, if so, whether it is converging toward French.

from weekly columns published in newspapers. Leclercq's text is a novel that tells the story of a young Picard man in the 1950s. This three-way comparison allows us to assess the degree of similarity between bilinguals' French and Picard production, measure the distance between the written and oral community norms, and assess diachronic change based on written data.

**Generation Author Lifespan Text & publication year** 1 Gaston Vasseur 1904–1971 *Lettes à min cousin Polyte* (1938–1971) 2 Jean Leclercq 1931–2021 *Chl'autocar du Bourq-Éd-Eut* (1996) 3 Jacques Dulphy 1959 *Ch'Dur et pi ch'Mo*, Tome III (2011)

Table 3: Written Picard corpus over three generations of authors (adapted from Auger & Villeneuve 2019: 218)

We extracted all instances of unambiguous 1pl reference from our Picard and French corpora. As is customary, our data collection excluded contexts where no variation is possible, such as fixed expressions (e.g., *o diroait qu'* 'it seems like', *conme o dit* 'as we say'). Each token was subsequently coded for the binary dependent variable, i.e., 1pl or 3sg verbal morphology, and for a variety of independent variables: verb tense, the presence or absence of an overt semantic reference expression (see 2a–2b), as well as restriction and specificity of the 1pl semantic reference. In this paper, we follow the example of King et al. (2011) and focus on referential restriction.

	- b. **Oz** we **é-r-ons** have.fut-1pl eune an.f armée army forte strong pour for pu no.longer avoér have la the.f djerre war (*Lettes* 1945, 165)

'We [implied: all French citizens] will have a strong army to no longer have war'

Our coding for referential restriction followed Boutet's (1986) ternary distinction based on restriction and specificity, as operationalized in Rehner et al. (2003)

and King et al. (2011: 482). Specifically, we distinguished between a restricted group which is specific and includes only people known to the speaker, such as members of their family (see 3a), a specific unrestricted group of individuals, some of whom may not be known by the speaker, such as employees of a large factory or all French people (see 3b), and a general unrestricted group – humankind, people in general – which includes the speaker (see 3c).<sup>7</sup> Our overall data set includes 61 tokens of unambiguous 1pl for which the discursive context did not allow us to reliably determine whether the group being referenced was restricted and/or specific; these were coded as "ambiguous" for semantic reference.<sup>8</sup>

	- a. c'est it is pour for ça that qu' that **nous** we **av-ons** have-1pl appelé called notre our fille daughter [Marie]. Marie (Jérôme D.)

'this is why we [my wife and I] called our daughter Marie'


'at some point, one may be Catholic [but] some behaviours take over'

Although King et al. (2011: 482) "did not include reference to humankind as a whole in the unrestricted group due to the difficulty of distinguishing such utterances from indefinite reference", the rich discursive context of our written data allows us to expand on their work by further distinguishing references to general unrestricted groups that include the speaker and all of humanity, i.e. general 1pl

<sup>7</sup>General unrestricted references include examples that include all of humanity at a past time; e.g., *Au XVI siècle, on mourait beaucoup plus jeune* 'In the 16th century, one died much younger'.

<sup>8</sup>An anonymous reviewer asks how we have coded the semantic reference of examples such as *Alors, on se proméne?* 'So, one's taking a walk?', where *on* refers to a neighbor that the speaker would pass on the sidewalk. Such examples are excluded from our analysis, as they do not meet the definition for our variable, that is, a pronoun that refers to a group of speakers that includes the speaker.

semantic reference, from 3sg indefinite reference. For instance, the French *on* in (4) unambiguously refers to an indefinite 3sg – the speaker was a child during the war and did not participate in the violence described – and the Picard *o* in (5) unambiguously excludes the speaker who is instead included in the object pronoun *no*. While both studies exclude examples of this type from the variationist analysis of 1pl, our analysis includes utterances like (6), where the discursive context, which explicitly refers to the time when the letter's author and his addressee were young, makes it clear that the unrestricted general group includes the speaker. This methodological decision allows for a more fine-grained data set on which to test the role of semantic specificity on the incursion of 3sg into 1pl domain.

(4) French

**on** one **fais-ait** made-3sg sauter burst leur their.f maison house ou bien or on one les them tu-ait. kill-3sg (Joseph L.) 'their houses would get bombed or they would get killed'

(5) Picard

J'ai I have idèe idea […] […] qu' that o one no us prind take.3sg pour for des some.m coéchons pigs

(*Lettes* 1946, 152)

'I think […] that we are taken for pigs'

(6) Picard O we din-ouot lunch-ipfv à at trouos three heures, hours t' you.refl in of-it souviens recall -tu you ? (*Lettes* 1956, 638) 'we used to have lunch at 3 o'clock, do you remember?'

#### **4 Results**

Let us now turn to the results of our quantitative analysis. First, our contemporary oral data reported in Figure 1 show a clear dominance of the innovative French-like 3sg form in our oral data: use of the 1pl form is marginal in both oral French (1.9%, N = 368) and oral Picard (15.1%, N = 338). This pattern stands in sharp contrast with our contemporary and older written data, where 3sg is far from generalized (54.6%, N = 1,304). Unsurprisingly, texts appear more conservative than spontaneous speech.

#### 2 Assessing change in a Gallo-Romance regional minority language

Figure 1: 1pl in Vimeu French and Picard

The high proportion of 3sg forms in interviews could be interpreted as evidence that oral Picard is converging toward French, a language where the change from 1pl pronoun and inflectional morphology to 3sg morphology is quite advanced. In fact, a similar pattern emerged from a previous analysis of verbal negation in the same oral corpus (Villeneuve & Auger 2013). However, a closer examination of linguistic factors reveals that a large proportion of the 1pl forms found in our oral data refers to specific restricted groups, as exemplified in (7) where the 1pl verbs refer to the participants in a specific hunting event, despite the fact that 3sg is also attested in these semantic contexts (see 8); where the 1pl and 3sg verbs refer to the speaker and his fellow students.

(7) Picard

**Oz** we **ons** have.1pl veillé stay.up ein one.m tchot little molé bit pi and **oz** we **ons** have.1pl fini finish pér by nos us adoveu doze.off (Joël T.,320)

'we stayed up a bit and we ended up dozing off'

(8) Picard

Mais but nous, us **o** we **sav-o-ème** knew-pst-1pl bien, well à at l' the.f école school normale teacher.training que, that quand when **oz** we **ét-o-ème** were-pst-1pl avec with éch' the.m professeur professor **o** we **dis-o-ait** said.3sg « pluriel », pluriel mais but quand when **oz** we **ét-o-ait** were.3sg intré between nous, us **o** we **dis-ou-ot** said.3sg « pluriél ». pluriél

(Joseph L., 51)

'But we knew well, at teacher training school that when we were with the professor we said "pluriel", but when we were among us, we said "pluriél"'

The frequency with which the 1pl form is still used in written Picard can shed light on the mechanism behind similar morphological changes in Romance languages. Specifically, our 592 tokens of 1pl *o* -*ons* forms (or 45.4% of our written data), carefully coded for referential restriction, represent a valuable data set with which to test the effect of referential restriction on 1pl morphology. Indeed a detailed analysis of 1pl semantic reference indicates that the innovative French-like 3sg form is still primarily associated with unrestricted general reference in written Picard (91.9%, N = 678 vs 15.3%, N = 626 in other contexts) and is barely used to refer to restricted groups, as we can see in Figure 2.

Figure 2: 1pl and semantic reference in written Picard

Although the use of the 1pl form remains much more frequent in written than in oral Picard, there is a possibility that its frequency may be gradually decreasing over time. One piece of data that suggests such a decrease comes from a realtime analysis of the chronicles in Vasseur's *Lettes*. Since these chronicles were published over a period of 33 years, we can compare the rate of use of 1pl over time for an individual author. This comparison reveals an apparent decrease in 1pl use across this portion of Vasseur's lifespan, from 44.7% in the 1940s to 35.6% in 1960s.

In order to test the possibility of a change in progress in the Vimeu Picard community more broadly, i.e., the gradual replacement of 1pl by 3sg, we turn to our data from three different authors who represent more distant time periods: the


Table 4: Frequency of 1pl. per author

1940s–the 1960s, the 1990s, and the 2000s. As we can see in Table 4, the overall frequencies of 1pl do not suggest a gradual loss of 1pl, as the lowest frequency is found in the older data from Vasseur and the highest occurs in Leclercq's data. What these numbers do not tell us, however, is whether the 1pl and 3sg verbs used by the three authors have similar semantic distributions. Indeed, the greater use of 1pl in Leclercq's data may be attributable, at least in part, to the fact that his novel tells the story of a young man in the 1950s, a genre that may result in a higher number of restricted references than Vasseur's chronicles, which take the form of letters and postcards that discuss past and current events and relate them to the personal lives of their author and his addressee, or Dulphy's chronicles, which consist of conversations on current events between two men. In order to tease out the possibility that the different rates of 1pl in the three texts might be due to an uneven distribution of the data across semantic references rather than to change in progress, we now break down our data for each author by semantic category. Table 5 confirms that the distribution of semantic values differs greatly across texts, and that this difference provides a plausible explanation for the frequencies of 1pl. Indeed, Vasseur's text, which features the largest frequency of 3sg, has by far the largest proportion of unrestricted general referents, a context known to favour the innovative 3sg, while the one that has the highest propor-


Table 5: Frequency of semantic reference type per author

Figure 3: 1pl and semantic reference in written Picard

tion of 1pl, Leclercq's, contains the largest number of restricted group referents, a context resistant to the incursion of 3sg into 1pl domain.

We can now attempt to determine whether use of 3sg is really spreading over time in written Picard by breaking down our data by author and semantic reference, as shown in Figure 3. This nuanced breakdown reveals considerable stability over time. For general unrestricted referents, 3sg strongly dominates in all three authors, with an average frequency of 91.9%. For specific unrestricted and restricted groups, 1pl dominates in the data from all three authors. However, signs of opposite trends separate the more recent data (1990s and 2000s) from those from the mid-20th century. Surprisingly, use of 3sg decreases over time for specific unrestricted referents. But, most interestingly, use of 3sg, which was not attested in Vasseur's data, makes an appearance in the 1990s and 2000s data. Examples (9–11) attest to the variation between 1pl and 3sg in all three semantic contexts, namely unrestricted general (9), unrestricted specific (10), and restricted groups (11).

(9) Picard

a. **o** we n' neg **porr-ons** can-fut-1pl pu anymore vive live su on la the.f terre earth (*Lettes* 1956, 644) 'we won't be able to live on the land' b. **o** one n' neg **laiche** let mie not mourir die parsonne anybody (*Lettes* 1946, 168)

'we don't let anyone die'

	- a. o we n-n of-it av-ons have-1pl connu known deux two d' of djerres wars […], o we sav-ons know-1pl ch that qu' that i it n-n of-it est is (*Lettes* 1956, 655) 'we have gone through two wars, we know what it is'
	- b. o one s' self plaint complain souvint often in in France France qu' that oz one est is d' of trop too boin good […] (*Lettes* 1966, 1180)

'we often complain in France that we're too good'

	- a. Nous us deux two mn' my.m honme, man o we n' neg é-r-o-éme have-fut-ipfv-1pl pu anymore qu' that à to minger eat (*Chl'autocar* 1996, 20)

'My husband and I, we'd only need eat'

b. oz one est is quate four chonq five camarades buddies à at l' the.f école school insanne together (*Chl'autocar* 1996, 59)

'we're four or five buddies in school together'

We close this section with a discussion of two examples drawn from newspaper chronicles that mix comments on current events and events from the personal lives of the characters that they feature and that were written and published 60 years apart. The first example (12), published in 1946, features four tokens of 3sg and one token of 1pl. The first instance of 3sg occurs in a frozen phrase (*oz a bieu dire*) in which the subject has unrestricted general reference. While the last two do not occur in frozen phrases, they also have unrestricted reference. The second token, *oz étouot pététe gramint moins riches* refers to the unrestricted but specific group of people who lived in the author's village and surrounding area. As for the only 1pl token, it refers specifically to the letter's author and his addressee. This short passage illustrates that, for Gaston Vasseur, 3sg and 1pl still have distinct meanings. Published in 2006, the second example (13) features four tokens: three 3sg and one 1pl. The first token illustrates the exclusive reference for which use of 1pl is excluded. The next two tokens of 3sg clearly refer to the two protagonists and are coreferential with the 1pl token, as the last sentence, which lists the people present at the *réveillon*, shows. Thus, even though use of 3sg for restricted reference remains infrequent in our most recent Picard data, this example provides evidence for a possible incipient change similar to the one that has taken place in French.

(12) Older written Picard (1946)

Mais, but **oz** one **a** has bieu beautiful dire, say Polyte, Polyte **oz** one **étouot** was pététe maybe gramint a-lot moins less riche rich du of-the.M.SG temps time qu' that **oz** we **alloémes** go-IPFV-1P au to-the.M.SG djignel, guignole **oz** one **étouot** was moins less riche, rich mais but **oz** one **étouot** was moins less béte, mean moins less mawais bad d' of l' un à l' eute.

the.SG one.M to the.SG other

'But, it is all very well, we were maybe much less rich when we used to *alleu au djignel* (go door to door and ask for apples on December 24), we were less rich, but we were less stupid, less mean toward each other.' [*Lettes*, 169]

	- a. Ch'Dur

ch' it est is point not pasqu' because **o** one n' NEG o has point not pérlè spoken d' of nous us qu' that **o** one n' NEG s' REFL a has point not vus. seen.PL Ti INT point not vrai, true, ch'Mo? ch'Mo 'it's not because they haven't talked about us that we haven't seen each other. Right, ch'Mo?

b. Ch'Mo

Pour for seur! sure **O** one s' REFL a has meume even vu, seen et and pi rvu. seen-again **Oz** we ons have.1PL meume even rinvillonnè celebrated insanne. together. À At vo your moéson, house qu' that a it s' REFL a has passè. happened. Y there avoait was mi me pi and chol the.FEM Molle, Molle ti you.SG pi and chol the.FEM Dure, Dure és her mére, mother et and pi Niflette Niflette no our bétail animal dé of tchiénne… bitch For sure! We have even seen and seen each other, again and again. We have celebrated Christmas together. At your house, it was. There

was me and chol Molle, you and chol Dure, her mother, Dorine, and Niflette our dog' [*Dur Mo*, 411]

#### **5 Conclusion**

The grammaticalization of pronouns and determiner phrases previously used to refer to indefinite referents into 1pl in French and in Brazilian Portuguese has received considerable attention from linguists. While previous studies have identified linguistic and social factors that favor this process, its analysis in contemporary French has suffered from two important limitations: the marginal use of the *nous* pronoun and the uncertainty concerning the specific 1pl form that has undergone replacement. The Picard data from the Vimeu region that we have analyzed in this paper circumvent both limitations, as use of *os* -*ons* is well documented historically and this form remains solidly implanted in contemporary usage. Our diachronic analysis of written data spanning from the 1940s until the 2000s reveals a Gallo-Romance variety that remains largely unaffected by the changes that have taken place in colloquial French and in oral Picard, and where the choice between 1pl and 3sg is strongly correlated with referential restriction. While unrestricted general referents strongly favor 3sg and show marginal use of 1pl, 1pl remains the almost exclusive variant for specific referents but shows some signs of incipient change. Interestingly, the semantic category that would be expected to serve as a gateway for the innovative uses of 3sg, namely unrestricted specific referents, appears to increasingly favor 1pl pronouns. Analysis of a larger corpus of written data from different genres and produced by a variety of authors will be necessary in order to confirm or disconfirm the results from our preliminary analysis.

In short, our examination of this variable demonstrates the importance of carefully considering linguistic conditioning through the comparative method when assessing language change in two typologically related varieties, especially when testing popular claims that a minority language is converging toward its dominant counterpart in a bilingual community. It also shows the importance of analyzing multiple linguistic features. Indeed, the conservative character of 1pl in Picard mirrors what has been reported for *ne* deletion, while contrasting with this variety's innovative character with respect to subject doubling and the generalization of a single auxiliary, *avoér* 'have' (Auger & Villeneuve 2017, 2019, Villeneuve & Auger 2013).

#### **References**


Julie Auger & Anne-José Villeneuve

Zilles, Ana M. S. 2005. The development of a new pronoun: The linguistic and social embedding of *a gente* in Brazilian Portuguese. *Language Variation and Change* 17(1). 19–53.

## **Chapter 3**

## **The partial loss of free inversion and of referential null subjects in Brazilian Portuguese**

### Mary A. Kato<sup>a</sup> & Maria Eugenia Lammoglia Duarte<sup>b</sup>

<sup>a</sup>Universidade Estadual de Campinas / UNICAMP <sup>b</sup>Universidade Federal do Rio de Janeiro / UFRJ

Brazilian Portuguese (BP) has been considered a *Partial Null Subject language* with the following properties: optional referential null subjects (RNS), null generic subjects, and null expletives. The aim of this paper is to discuss the nature of the optionality of RNSs. The case of the null generic subjects, partially attested in BP, and of null expletives are not under discussion. Using the macro-parametric view of the NS Parameter, we will make a joint discussion of both the possibility of RNSs and of free inversion in present BP. With regard to the latter, we will propose first that partial loss of free inversion has to be relativized in terms of prosodic weight. With regard to the former, we propose that optional RNS is felicitous when a sentence has a linear V2 (non-Germanic) pattern at PF, with the presence of a short cliticized element, which includes the subject pronoun. In both cases, we claim that BP has a filter at PF (Avoid V1).

### **1 Introduction**

#### **1.1 The problem**

Brazilian Portuguese (BP) has been known to have partially lost the properties of the Null Subject Parameter (NSP), as conceived in its macro-parametric view,<sup>1</sup>

<sup>1</sup>Cf. Rizzi (1982),Chomsky (1981)

Mary A. Kato & Maria Eugenia Lammoglia Duarte. 2023. The partial loss of free inversion and of referential null subjects in Brazilian Portuguese. In Barbara E. Bullock, Cinzia Russi & Almeida Jacqueline Toribio (eds.), *A half century of Romance linguistics: Selected proceedings of the 50th Linguistic Symposium on Romance Languages*, 41–62. Berlin: Language Science Press. DOI: 10.5281/zenodo.7525096

since the middle of the 19th century.<sup>2</sup> Two major explanations can be given for this partiality:

a) the changes are still in progress, and the partiality has to do with the fact that the changes are not completed yet (see Figure 1).<sup>3</sup>

Figure 1: Null subjects (vs overt) in theater popular plays across two centuries (Duarte 1993)

b) the partial aspects of the change have to do with the fact that there are partial NS languages, among which BP,<sup>4</sup> with *optional* referential human and non-human NSs (1a–1b and 2a–2b), null expletives (3a–3b) with existential and weather verbs, and, we add, free inversion restricted to unaccusative verbs (4a-4b) reanalyzed as existentials.<sup>5</sup>

	- b. Meu my marido<sup>i</sup> husband foi was quase almost preso arrested aí there no in.the fort fort porque because **Ø**i foi went

<sup>2</sup>Cf. Tarallo (1993); Duarte (1995); Kato (1999, 2000a), Kato & Duarte (2017)

<sup>3</sup>Cf. Cyrino et al. (2000: 58–59) propose a referential hierarchy that guides changes concerning pronominalization. Under their hypothesis, [+N, + human] arguments, the speaker and the addressee, are the highest in the referential hierarchy, and a proposition, the lowest. [−human] entities are in between, with the [-animate] entity interacting with [+animate/+human] ones. The feature [+/−specific] interacts with all the other features. This explains why the change towards overt pronouns affects 1st and 2nd persons first, whereas 3rd person, exhibiting [+/−human] referents shows a slower increase of overt pronominal subjects, as shown in Figure 1

<sup>4</sup>Cf. Holmberg & Sheehan (2010).

<sup>5</sup>Cf. Kato & Tarallo (1993), Kato (2000b, 2002a). According to the latter, unaccusatives have been reanalyzed in BP as existentials, the reason why the verb is always in the 3rd person singular.

mergulhar.

dive.

'My husband was almost arrested at the fort because he went for a dive.'

	- b. O the sistema system público<sup>i</sup> public é is totalmente totally diferente different de of empresas companies privadas. private **Ø**<sup>i</sup> não not funciona works da of.the mesma same maneira. way

'The public system is totally different from private companies. It does not work the same way.'

	- b. Aqui here **Ø**exp não not chove rains muito. lot 'It doesn't rain very often here.'
	- b. \* Riu laughed a the plateia. audience 'The audience laughed.'
	- c. \* Deu gave uma a carta letter pra to ela her o the Pedro Pedro 'Pedro gave her a letter.'

The optionality shown in (1) and (2) does not mean that NSs are frequent in spoken BP. Recent empirical research (Duarte 2020) shows that BP is losing crucial properties of Romance NS languages, such as anti-c-command relation and c-command relation between subjects, contexts where a null subject is categorical in European Portuguese (EP). In BP, the anti-c-command environment shows the lowest rates of null subjects in a postposed main clause (around 11% of the

data) whereas the latter, with a subordinate clause following its main clause, already reaches 40%, very distant rates from those attested for EP, 93% and 94% of null subjects, respectively. This gradual result for E-language confirms that the value of the NSP has already been reset in BP. As for restricted "free" inversion, we will show that, in spite of some possible contexts, it is restricted to thetic sentences with unaccusatives and that the variation SV/VS in the same context is in course, preferably with definite DPs with [+human] semantic feature, an important step in the change.

#### **1.2 The Aim**

The aim of this study is to show that variation/optionality in the properties of the NS parameter (NSP) in BP, attested today in writing and planned speech, has a stylistic or prosodic character (Kato 2013a), and does not constitute morphological or syntactic *doublets*, in the sense of Aronoff (1976) and Kroch (1994).<sup>6</sup> §2 will discuss the optional character of free inversion, and §3 will discuss the optional possibility of Null Subjects (NSs). §4 will synthesize what was described in the previous sections to see if we can sketch a common explanation in terms of trigger or consequence(s) of the changes. In the last section we will draw the conclusions. Our diachronic data come from popular plays, written in Rio de Janeiro; synchronic data have been collected in recent interviews recorded in Rio de Janeiro between 2009–2010 (available at www.corporaport.letras.ufrj.br).

### **2 The partial loss of free inversion**

#### **2.1 Free inversion in Romance**

Comparing BP with other NS Romance languages, Kato (2000a,b) noticed that free inversion is more easily found when the objects are clitics. First, according to Bentivoglio & Introno (1978), inversion is easily found in Spanish when the complements are clitics. The same is found to be true in Italian by Benincà & Salvi (1988):

<sup>6</sup> "Syntactic heads, we believe, behave like morphological formatives generally in being subject to the well-known 'Blocking Effect' (Aronoff 1976), which excludes morphological doublets, and more generally, it seems, any coexisting formatives that are not functionally differentiated. This exclusion, however, does not mean, either for morphology or for syntax, that languages never exhibit doublets. Rather it means that doublets are always reflections of unstable competition between mutually exclusive grammatical options." (Kroch 1994: 181)

	- b. Quería wanted hacer=**lo** do=it Juan. John 'John wanted to do it.'
	- b. ? Há has mangiato eaten la the torta pie **la** the **mamma**. mommy 'Mommy has eaten the pie.'

In BP, such constraint has been aggravated by the fact that it has lost part of its clitics, particularly those belonging to the 3rd person paradigm, which made the right side of the verb heavier. The examples in (7a, 7c and 7e) show that the sentences are ungrammatical in BP because this variety does not dispose of clitics, and in (7b, 7d and 7f), they are ill-formed because the right hand side of the verb is too heavy.



This shows that free inversion in Romance NS languages is constrained by phonological weight, a fact that made Zubizarreta (1998) propose that predicate inversion in Romance results from a general predicate movement, which is constrained by prosody, called *P-movement*. The existence of clitics and their light nature explains why predicates with clitics are found in Romance inversion. However in BP, such a constraint has been aggravated not only by the loss of 3rd person clitics, which are replaced by weak pronouns optionally null in anaphoric contexts (cf. Kato & Tarallo 1993), but also by the fact that 2nd person clitic *te* is in variation with the weak pronoun *você* 'you', which is originally an address form *Vossa Mercê* 'Your Grace', that fully grammaticalized as a pronoun in BP and used in nominative, accusative, and oblique functions (Lopes & Brocardo 2016). Since its implementation through the 20th century, there has been a competition with canonical 2nd person *tu* 'you' and the accusative and oblique forms associated with it. Today, although their distribution is mainly diatopic, *você* outnumbers *tu*. Table 1 shows the changes in the paradigm and in the placement of the remaining clitics (cf. Kato 1993).<sup>7</sup>


Table 1: 2nd and 3rd person accusative pronominal paradigms (clitics and weak pronouns)

#### **2.2 V1 vs V2 in BP free inversion**

As we realized that the partial loss of free inversion was triggered by the expressive loss of clitics, we considered the possibility that it was independent of the

<sup>7</sup> See Nunes (2019), for whom the null object in BP is an object agreement.

#### 3 The partial loss of free inversion and of referential null subjects in BP

loss of the NS. However, in a research project on spoken BP, Kato (2002b) and Kato & Duarte (2003) concluded that this variety of Portuguese rejects V1 structures in free inversion, with transitive and intransitive verbs filling the preverbal position, when possible, with a short (light) item, even a discursive one. The authors associated this restriction to the new prosodic rhythm of the language, a consequence of the resetting of the NSP.

	- b. \* Vai goes **ali** there a the Maria. Maria 'There goes Maria.'
	- c. **Lá** there vem comes o the bonde. tram 'There comes the tram.'
	- d. ? Vem comes **lá** there o the bonde. tram 'There comes the tram.'

Pilati (2006) proposed that inversion is obtained more easily in BP when a deictic or a locative element satisfies the EPP, occupying the verb initial position:

	- b. **Aqui** here dormem sleep **as** the **crianças**. children 'The children sleep here.'

Buthers & Duarte (2012) proposed that, as BP has become *a partial NS language* with optional referential NSs, there has been an increasing tendency to avoid null expletives in VS structures, and locatives have become grammaticalized as lexical expletives, just like *there* in English, which licenses VS even with transitive verbs with lower frequency:

(10) a. **Lá** there vai goes **o** the **time** team **de** of **futebol**. soccer 'There goes the soccer team.'

b. **Aqui** here constrói builds **um** a **país**. country 'Here a country is built.'

As such a constraint increases, the pattern XP V (YP) also does, affecting especially impersonal constructions, which still allow null expletives:

	- b. **São Paulo** São Paulo chove rains 'It rains in São Paulo.'
	- c. Øexp Faz does frio cold em in Curitiba. Curitiba 'It is cold in Curitiba.'
	- d. **Curitiba** Curitiba faz does frio. cold 'It is cold in Curitiba.'

#### **3 The partial loss of NSs in Brazilian Portuguese**

#### **3.1 The ongoing loss of referential NSs in BP and their recovery through instruction**

While in the 19th century and the beginning of the 20th century BP, the data attested in the plays shows BP as a consistent NS language (see 12), from the decade of 1950 and on, BP has been described as having lost the referential NS in most contexts (see 13) and preserved the non-referential null expletive (see 14).<sup>8</sup>

(12) a. Ontem yesterday Ø1ps Ø1ps comprei-**lhe**<sup>i</sup> bought-**him**<sup>i</sup> o the hábito costume com with que which Øi 3ps Øi 3ps andará be.fut vestido. dressed 'Yesterday I bought him the costume he will wear.' (*O noviço*, Martins Pena, 1845)

<sup>8</sup> See Duarte (1995); Figueiredo Silva (1996); Modesto (2000) *inter alia*

b. Ø2ps Ø2ps Terá have.fut o the cavalo horse que that Ø2ps Ø2ps deseja. wish 'You will have the horse you wish.'

(*O simpatico Jeremias*, 1918, Gastão Tojeiro)

	- b. Se if **eu** I ficasse stayed aqui here **eu** I ia would querer want ser be. a the madrinha. godmother 'If I stayed here I would like to be the godmother.'

(*No coração do Brasil*, M. Falabella, 1992)


(*Os irmãos das almas*, Martins Pena, 1845)

b. E and Øexp **tem** has o the quarto room da of.the empregada maid lá fora.<sup>9</sup> outside 'And there is a maid's room outside.'

(*Um elefante no caos*, Millôr Fernandes, 1955)

<sup>9</sup>Existential *haver* 'there is/are' has been replaced in speech by the possessive *ter* 'to have', which keeps the possessive meaning and the innovative existential meaning, as in *Na esquina tem uma livraria* 'On the corner has a bookstore'; *ter* has the advantage to allow the projection of Spec, TP, as in *Eu tenho uma livraria na esquina* 'I have a bookstore on the corner', which suits the change in progress (Duarte 2003).

Let us see what happens in the acquisition of BP considering such a change. As claimed by Lightfoot (1999), children's core grammar does not have *doublets*, containing only the innovative form. Confirming this claim, Simões (2000) shows that Brazilian children do not have NSs for referential subjects as in (15), keeping them for expletives only as in (16).

(15) a. **Eu** I to am botando throwing Ø (null object). 'I am throwing it (out).' b. Não neg quer want Ø, **tu** you não neg quer want Ø? 'She/he doesn't want it, you dont' want it?' c. **Ela** she anda rides a a cavalo, horse anda rides de of moto, motorcycle **ela** she anda. walks 'She rides a horse, rides a motorcycle, she gets around.' (André, 2;4) (exs. from Simões 2000) (16) a. Øexpl Tem have.3sg dois two aviões. airplanes 'There are two planes.' b. Øexpl É is esse this que that cabe. fits

'This is the one that fits.' (André, 2;4) (exs from Simões 2000

In BP, however, NSs are shown to be recovered by instruction at school (Magalhães 2000) (See Table 2).


Table 2: Null subjects recovered through schooling (Magalhães 2000)

With only 2,11% of NSs in the first grade, Magalhães shows that optionality of NSs results from schooling. At the 7th grade, adolescents start behaving like literate adults.

(17) a. Ø1ps (I) vou fut pedir ask uma a ordem prescription ao to.the médico doctor porque because **eu**1 I não not aguento stand ver to.see você you sofrer suffer mais. anymore 'I'm going to ask the doctor for a prescription because I can't stand to see you suffer anymore.' (7th grade) b. **Eu**<sup>1</sup> I estou am de of castigo, punishment porque because Ø1ps (I) briguei argued com with minha my irmã sister e and Ø1ps (I) não not vou fut poder can jogar play futebol soccer hoje today.

'I'm have been punished because I argued with my sister and won't be able to play soccer today.' (7th grade)

The NS acquired through schooling is not part of the child's core grammar, and it can be said that NS in the writing of literate adults is part of a second grammar in *the periphery* of his/her I-language.<sup>10</sup> The variation/optionality that we find in students' and literate Brazilians'<sup>11</sup> writing is like the phenomenon of code-switching, and the effect is stylistic. Looking at Table 2, at the distribution of 3rd person NSs in the speech of Brazilian adults, we can see that it is much below what we have with Europeans, which shows a decline in progress. But in writing, Brazilians show a recovery of more than 20% in the use of NSs, compared to their speech.

Table 3: Null subjects recovered through schooling (Magalhães 2000)


Notice that the variation between NSs and pronominal subjects in the Brazilian adult is very similar to that of 7th graders, whereas NSs are frequent in speech,

<sup>10</sup>According to Chomsky (1988), the adult's I-language may contain an extended periphery, with old forms, or even a mixture with a second language, like what heritage speakers tend to do. <sup>11</sup>Research on the speech of literate adults show that acquired/learned null subjects, 3rd person clitics, existential *haver* 'there is/are ', and so many other features that are not in the primary acquisition data are not carried over to their spontaneous speech (Duarte 1995; Freire 2000; Duarte 2003, among many others).

as well as, in writing in spoken and written EP. Considering that the overall rate of null subjects in BP is around 28%, we can say that schooling shows a relative "success" by reaching about half of what EP writing reveals. The optionality is illustrated in BP standard writing.

	- b. Ele<sup>i</sup> he explicou explained que that à at tarde afternoon ele<sup>i</sup> he vai fut avaliar evaluate todas all as the alternativas. alternatives

'He explained that he would evaluate all the alternatives in the afternoon.'

c. A the França<sup>i</sup> France se itself prepara prepares para for o the ataque attack ao to.the inimigo<sup>2</sup> . enemy Ø<sup>i</sup> Sabe knows que that ele<sup>2</sup> 3sg se itself aproxima, approaches mas but Ø<sup>i</sup> não not sabe knows exatamente exactly quando when Ø2 aparecerá.

appear.fut

'France is getting ready for the attack to the enemy, but does not know when it will appear.'

d. O the governo<sup>i</sup> government não not considera considers o the fato fact de of que that os the possíveis possible desvios misappropriations e and exageros exaggerations de of que which ele<sup>i</sup> 3sg tanto so\_much se refl queixa complains são are criticados criticized na in.the própria very imprensa. press 'The government does not consider the fact that possible misappropriations and exaggerations about which it so often complains are criticized by the press itself.'

Concluding this section, we can say that NSs in the I-language of literate Brazilians is a function of stylistic prescriptions imposed by instruction.

#### **3.2 Optionality cases independent of competing grammars**

With the ongoing change from [+NS] to [−NS], Duarte (1995) found out that when the finite verb was preceded by a short item, the subject pronoun could be *optionally* null, although overt pronouns are preferred.

	- b. (Cê) (You) **Nunca** never ouviu heard.2sg falar talk nele?<sup>12</sup> about.him 'You never head talk about him.'
	- c. (Ele) (He) **Não** Not aguentou stood.3sg o the tranco. pressure 'He didn't stand the pressure.'
	- d. (Eu) (I) **Me** refl.cl tornei became.1sg independente. independent 'I became independent.'

We have tested the following group of sentences, and noticed that they are all well-formed except for (20a).

	- b. **Eu** I como eat.1sg cenouras carrots orgânicas. organic 'I eat organic carrots.'
	- c. Ø **Não** not como eat.1sg cenouras carrots orgânicas. organic 'I don't eat organic carrots.'
	- d. Ø **Só** only como eat.1sg cenouras carrots orgânicas. organic 'I only eat organic carrots.'

<sup>12</sup>The short or light elements we refer to are clitics, negation, light adverbs, located inside the TP, usually between Spec,TP and V. Adverbs and other elements located in the left periphery of the sentence, like interrogative and relative pronouns are contexts where null subjects are almost completely lost.

The only difference between (20a) and the others is that the former starts with the main verb while others are introduced by a short element (cf. Duarte 1995), avoiding a V1 sentence pattern. Just as we showed a prosodic constraint disfavoring V1 in free inversion, we can say here that ordinary affirmative sentences in BP favor V2<sup>13</sup> as a general pattern. But it seems that in BP the first sentential element does not have to be the subject pronoun, but, as seen in (20), it can be some short element that cliticizes to the verb.

The usual cases of verb in initial position are in answers, but the derivation of such answers have to do with movement of the verb to the Focus initial position, followed by remnant erasure of the predicate (cf. Kato 2013b):


'It has.'

Traditionally, when the notion of the NS Parameter was introduced, it was conceived that the NS was a *pro*, which had to be licensed and identified (Rizzi 1982). More recently, inspired by an old idea of Perlmutter's (1971), who conceived NSs as resulting from a pronominal deletion process, Holmberg (2005) and Roberts (2010) adhered to this idea. In this paper, borrowing ideas from Kato & Duarte (2014, 2018), we will also conceive the referential NSs of prototypical NS languages as deleted *weak* pronouns.<sup>14</sup> The notion of licensing, required by the concept of *pro*, does not allow the idea of optionality. But the notion of deletion can place the phenomenon at PF, the level that defines stylistic rules according to Chomsky & Lasnik (1977).

<sup>13</sup>This "linear" V2 order should not be confused with the structural Germanic V2.

<sup>14</sup>Kato (1999) used the idea of weak *vs.* strong pronouns from Cardinaletti & Starke (1999).

#### **4 From triggers to consequences**

#### **4.1 The role of rich morphology as the licensing condition (the trigger) for the referential NS**

That rich morphology is a licensing condition for the null subject in "consistent" NS languages (Roberts & Holmberg 2010) has been one of the most prevalent hypotheses, with diachronic evidence to support it. The changes that occurred in Old French (Adams 1987, Roberts 1993) and in BP (Duarte 1993, Kato 1999) support the rich Agr hypothesis, as it was the reduction of the Agr paradigm that led to the general loss of *pro* in the former and the loss of referential *pro* in the latter. Similar facts have been attested for Caribbean Spanish, which lost the NS in all contexts, like Dominican Spanish (Toribio 1993), or in referential contexts like Puerto Rican Spanish.

The empirical facts that support the Agr identification hypothesis come from Duarte (1993), who shows that in the main dialects of BP the grammaticalization of the old address form *você* (from *Vossa Mercê* = 'Your Grace'), which is associated with the 3rd person verb form, led to an inflectional paradigm in which the three-person distinction was lost. *Você* is now in competition with 2nd person pronoun *tu* 'you', which also lost its canonical distinctive ending in speech. The inflectional reduction has been aggravated by the grammaticalization of another nominal expression *a gente* 'the people', which entered the BP pronominal system in competition with *nós* 'we', very similar to French *on/nous*, and also requires 3rd person singular agreement (see Lopes & Brocardo 2016).

For Galves (1993), the consequence of such changes in the BP inflectional paradigm was a change in the strength of the AGR head, making it [-person]. One possible assumption regarding the partial loss of referential null subjects in BP was that the change was triggered by the loss of its rich inflectional morphology (Duarte 1995, 2018) and the acquisition of new free weak pronouns instead of a regular pronominal inflection (Kato 1999). See Table 5 for the inflections in contemporary paradigm and the reduction in the realization of the personal pronouns.

#### **4.2 The loss of the rich inflection paradigm in BP and its new linear V2 sentential pattern**

Until the 19th century, BP had a rich inflectional paradigm with one ending for each person, which qualified it as a type of *weak* pronominal-like clitic (cf. Kato 1999). We propose that, as a consequence of the loss of some of the bound person


Table 4: Evolution of the pronominal and inflectional paradigm in BP in two centuries (Duarte 2018)

Table 5: Personal pronoun contemporary paradigm reductions


inflection, we started having the spell-out of free weak pronouns in sentence initial position. As a consequence, we had a parallel *change in terms of sentential prosody*. While, before the change, examples with NSs exhibited a V1 sentential pattern, they now appeared as a V2 pattern with overt subject pronouns.


Table 6: Changes in BP sentential patterns

The same happened with free inversion, which is more acceptable when the first position is occupied by some constituent, resulting in a linear V2 pattern (Kato & Duarte 2018).


Table 7: V1 and V2 with free inversion

We can conclude that BP is a sort of partial NS language with remaining NSs and free inversion, both strongly constrained by prosodic factors in both cases. Regarding the changes that BP underwent, we can say that morphology was the trigger to change the value of the parameter, but prosody was the consequence.

#### **5 Conclusions**

In this work, we proposed that a stylistic prosodic rule affected both free inversion and NS constructions in BP.

	- b. \* Vem comes lá there **a** the **Maria**. Mary (V1) 'There comes Mary.'
	- c. Você you é are americano? American (V2) 'Are you American?'
	- d. \* É are americano? American (V1) 'Are you American?'

We may consider that, at the PF interface, languages have filters regarding their rhythm. To account for the preference for certain forms at this stage of the change in course in PB, a constraint of the form *"Avoid V1"* will be proposed. This constraint has nothing to do with an XP constituent in Spec of C with the verb in C, as in V2 Germanic languages, but with a prosodic requirement. This means that the initial element can be a head or an XP.

We may conjecture that the rhythmic pattern acts as a sort of parameter, just like morphology. The child would probably be more sensitive to prosody (first) than to morphology.

#### **Acknowledgement**

We would like to thank Marcello Marcelino for his usual revision of our articles.

#### **References**


### **Chapter 4**

## **The antipassive as a Romance phenomenon: A case study of Italian**

Karina High<sup>a</sup>

<sup>a</sup>University of Texas at Austin

This study focuses on the Italian pronominal verbs *lamentarsi* 'lament/complain', *ricordarsi* 'remember/remind', *vantarsi* 'praise/boast' and their transitive counterparts and analyzes their distribution from the 13th to the 21st century across different syntactic environments, with particular attention to logical object expressions. It explores the possibility of an antipassive (AP) analysis, thereby adding a Romance perspective to the growing research of the historical development of the AP. The pronominal constructions of the sample that select an oblique complement display structural characteristics typical of the AP. Namely, they contain a demoted logical object, are structurally intransitive and semantically transitive, mark the oblique using the preposition *di*, display a detransitivizing "AP morpheme" *si*, and have a transitive counterpart. For all three verb pairs, there is initially a high frequency of AP constructions (13th-15th centuries), followed by a decrease in favor of transitive constructions with a direct object complement.

#### **1 Introduction**

This study examines the distribution of a particular class of pronominal verbs and their transitive counterparts in Italian from the 13th to 21st centuries and explores diachronic and synchronic evidence for the antipassive (AP) construction. The verbs in question are *lamentare/lamentarsi* 'lament, complain, moan', *ricordare/ricordarsi* 'remember, remind', and *vantare/vantarsi* 'praise, boast'.

Karina High. 2023. The antipassive as a Romance phenomenon: A case study of Italian. In Barbara E. Bullock, Cinzia Russi & Almeida Jacqueline Toribio (eds.), *A half century of Romance linguistics: Selected proceedings of the 50th Linguistic Symposium on Romance Languages*, 63–84. Berlin: Language Science Press. DOI: 10.5281/zenodo.7525098

#### Karina High

As shown in (1a), the pronominal verb is semantically transitive and is characterized by the realization of the logical object<sup>1</sup> as an oblique complement, while its transitive form in (1b) selects a direct object complement.<sup>2</sup>

(1) a. Dopo after aver have.aux.inf cercato search.pst.ptcp dappertutto everywhere **si** se.3sg **ricordò** remember.pfv.pst.3sg **del** of.def.det.msg sogno dream.msg e and corse run.pfv.pst.3sg in in gardino, garden.msg vicino near al to.def.det.m.sg fiume, river.msg dove where dormendo sleeping l' her aveva have.aux.ipfv.pst.3sg veduta. see.pst.ptcp (*I racconti delle fate*, 1876)

'After having searched everywhere, he remembered the dream and ran into the garden, near the river where sleeping, he had seen her.'<sup>3</sup>

b. Chiunque whoever **ricordi** remember.sbjv.prs.3sg la def.det.fsg vita life.fsg italiana Italian.fsg al to.def.det.msg principio beginning.msg del of.def.det.msg secolo century.msg non neg potrà can.fut.3sg non neg sottoscrivere subscribe.inf a to questo this.msg apprezzamento. comment.msg (*Pensiero e azione del risorgimento*, 1943)

'Whoever remembers the Italian life at the start of the century, cannot not subscribe to this comment.'

The effect in (1a) is a change in the valency of the verb, as the number of arguments is reduced. A similar type of valency-reducing strategy called the AP has been studied in ergative languages and increasingly, in accusative languages. In AP constructions, the logical object is realized as a non-core argument or is omitted (but remains presupposed).

This paper analyzes how such semantically-related transitive and pronominal verbs pattern diachronically and if the diachronic perspective provides evidence for the AP construction.

<sup>1</sup> I am using the term "logical object" following Polinsky (2017). Others, such as Creissels (2012), Janic (2013), and Sansò (2017, 2019), refer to it as 'patient'.

<sup>2</sup> It is also possible to find *ricordarsi* followed by a direct object as an alternative to the construction in (1a). The use of this particular construction increases over time and is more frequent than the construction in (1a) in the 21st century. Examples of this construction are found in (5) and (9c) and are further examined in the Discussion.

<sup>3</sup>Unless otherwise indicated, all translations in this paper are my own.

The organization of the paper is as follows. In §2, I first discuss current research about Romance pronominal verbs and the Romance clitic se, as well as typological and historical work on the AP. In §3, I describe the data sources, the data collection and coding process. In §4, I present my findings. In §5, I relate these to a discussion of the AP and examine the possibility of an AP analysis and the mechanisms of diachronic change. Finally, I conclude with some remarks about the main findings and ideas for future research.

#### **2 Romance se and previous analyses**

The pronominal verbs of this study attest to the heterogeneous nature of Romance se, which is traditionally labeled reflexive. Accounts from the prescriptive tradition struggle to find a classification of pronominal verbs that captures the polyfunctionality of this clitic. Analyses have been proposed to describe different uses of Romance se. For instance, Melis (1985, 1990a,b) expanded on the classical grouping for French pronominals, while Nishida (1994) identified uses of Spanish *se* as an overt aspectual class marker. For Italian, Cennamo's extensive work sheds light on several phenomena regarding the diachronic development of the Late Latin/Early Romance reflexive *se/sibi*. Among others, she identified the use of the pleonastic reflexives *se/sibi* with intransitive verbs as markers for Split Intransitivity (1999) and studied the expansion of the domain of reference (1993) and the continuum of prototypical and less prototypical/grammaticalized uses of *se, sibi, suus* in Late Latin Christian inscriptions (1991).

Evidence has been discovered that suggests the existence of the AP construction in Romance. For Spanish, Masullo (1992) proposes the derivation of an AP *se* as a direct object in the deep structure, which is then incorporated into the verb; for Slavic and Romance, Medová (2009: vii) argues that "the reflexive clitic *se* is an AP morpheme of the sort known from the ergative languages" and proposes a parallel derivation for inherent reflexives and APs.

Typological studies of the AP, such as Polinsky (2017), present and discuss various manifestations of this construction and weave together characteristics shared across ergative and accusative languages, which can serve as a set of diagnostics to identify the AP. To date, the historical work on the AP has been limited to non-Romance languages; for instance, Terrill (1997), Creissels (2012), Janic (2013), and Sansò (2017, 2019) identified the reflexive construction as one of several sources of the AP marker. These studies, however, do not look for supporting evidence from Romance; moreover, the current research on the Romance AP does not adopt a diachronic perspective.

#### **3 Data and methods**

The data of this study is drawn from online databases and annotated corpora, as well as collections of texts that are accessible online. The sources include the corpus *Opera del Vocabolario della lingua italiana* (OVI), *Tesoro della Lingua Italiana delle Origini* (TLIO), the *Biblioteca dei Classici Italiani*, *IntraText*, the *Corpus Diacronico dell'Italiano Scritto* (DiaCORIS), and the *Corpus di Italiano Scritto* (CORIS/CODIS). Together they cover a period from the 13th until the 21st century and include literary (e.g., novels, poems, plays, operas) and non-literary texts (e.g., religious texts, journalistic writings, essays, and correspondence).

For each verb pair I randomly selected 60 tokens per century or if there were not sufficient data, I included all occurrences available. This was the case for *vantar(si)*, which records only 40 tokens for the 13th century. Finite and nonfinite verb forms are equally included in the dataset. I excluded passive forms and a handful of tokens that had prominent non-Tuscan characteristics, which is illustrated in (2), an excerpt in the Venetian dialect from Carlo Goldoni's comedy, *I Rusteghi*.

(2) Cossa what songio? be.prs.1sg un indf.det.msg tartaro? Tartar.msg una indf.det.fsg bestia? beast.fsg **De** of cossa what **ve** se.2pl **podeu** can.prs.2pl **lamentar**? lament.inf Le def.det.fpl cosse thing.fpl oneste honest.fpl le they.f me me piase please.prs.3sg anca also a to mi. me (*I Rusteghi*, 1760)

'What do you think of me? A Tartar? A brute? What have you to complain of? I don't object to honest pleasures.' (from Goldoni 1961: 109)

The queries resulted in a total of 1600 tokens. In order to isolate and study the distribution of the verb pairs in general and the verbs with a logical object, I grouped the data into two large categories based on the presence or absence of the clitic *si*. These categories I labeled TR, referring to verbs without *si* (e.g., *lamentare*), and PRO, referring to verbs with *si* (e.g., *lamentarsi*). In addition to author, title, and date, I also coded transitivity (Intrans/AP/Trans), type of phrasal complement (NP, CP, PP, Null), meaning, and type of logical object (IO/DO).<sup>4</sup>

<sup>4</sup> I also coded auxiliary selection for compound tenses. However, the pattern was exceptionless: PRO verbs selected the auxiliary *essere* 'to be' (like reflexive verbs), while TR verbs selected the auxiliary *avere* 'to have' (like transitive verbs).

#### **4 Analysis**

In the dataset, PRO forms are more common overall at 72% (1153 tokens) compared with TR forms at 28% (447 tokens). The overall distribution is heavily influenced by the fact that the PRO forms represent 73%–91% of the data for six centuries, from the 13th–18th centuries, but decline thereafter. Figure 1 shows the trends over time for the three verb pairs. For each verb pair, the solid line represents the PRO form and the dotted line represents the TR form and together total to 100%. In addition, the shadowed area displays the mean percent TR for all verbs over time.

Figure 1: Distribution across time

Starting from the 16th through the 18th century, the trendline documents an increase in the TR construction and a decrease in the PRO construction until they reach almost equal distribution in the 19th century. The individual verb pairs differ slightly with respect to their distribution across time. In the 19th century, *vantare* becomes more frequent than *vantarsi* (TR = 68.3%), *ricordare* barely surpasses *ricordarsi* (TR = 51.7%), and *lamentarsi* drops in frequency but continues to be more common than *lamentare* (PRO = 68.3%). This trend continues into the 21st century.<sup>5</sup>

<sup>5</sup>The unexpected dip in the trendline for *vantare*, which occurs in 20th century, may be due to noise in the data.

#### **4.1** *Lamentar(si)*

As represented in Table 1, *lamentare* most frequently occurs in constructions involving a direct object complement (43.16%) and intransitive constructions with a null object complement (37.89%), while *lamentarsi* most often selects a null object complement (40.67%) or and indirect object complement (37.08%).


Table 1: Syntactic environments of *lamentar(si)* (13th–21st c.)

Over time, there is an increase in the selection of a finite CP for both verbs. The most frequent TR construction in the 13th century is the intransitive construction with a null object complement (70%), which is surpassed by the direct object complement in the 21st century (78.95%). *Lamentarsi* most commonly selects an indirect object or a null object complement in the 13th century, at 44% and 46% respectively. By the 21st century, the frequency of the PP complement has decreased considerably (21.95%), while that of the null construction has increased (58.54%). The PRO and TR verbs overlap significantly in meaning. With or without an object, they most commonly have the meaning 'mourn, lament, complain (about)', as in (3a) for PRO and (3b) for TR. As in (3a), the object is introduced mostly by the preposition *di* 'of' or less often by *per* 'for'. As an intransitive, the non-pronominal form can also refer to the act of emitting sound while lamenting or suffering, that is, 'wail, moan'. This latter meaning is not shared with the PRO verb.

(3) a. E and molto much **si** se.3sg **lamentava** lament.ipfv.pst.3sg **di** of Guerrino, Guerrino cioè, that.is **della** of.def.det.msg sua his.fsg morte death.fsg e and di of Bernardo Bernardo suo his.msg fratello, brother.msg ch' who era be.aux.ipfv.pst.3sg preso, take.pst.ptcp ma but non neg sapeva know.ipfv.pst.3sg dove where s' se.3sg era, be.aux.ipfv.pst.3sg s' if egli he era be.aux.ipfv.pst.3sg preso take.pst.ptcp o or morto. dead.msg (*I reali di Francia*, 1491) 'And much did he [i.e., Gherardo] mourn Guerrino, that is, his death, and Bernard his brother, who was taken captive, but he did not know where he was, if he had been taken captive or was dead.'

b. Quella that.fsg dispicca take.off.prs.3sg un indf.det.msg vol flight.msg sopra on il def.det.msg pollone shoot.msg D' of un indf.det.msg vecchio old.msg salcio, willow.msg e and colassù up.there **lamenta** lament.prs.3sg Il det.det.msg suo his.msg timor dread.msg pe' for tenerelli tender.mpl aspetti: aspect.mpl (*Il primo bacio*, 18th c.)

'And she [i.e., the hen] flies off up to the branch of the old willow and from there laments her dread for her tender little ones.'

These examples document the most common environments of *lamentar(si)* involving a logical object. As indicated in Table 1, less common constructions include ones where the PRO verb selects an NP complement as in (4a) and the TR verb occurs with an PP complement as in (4b), as well as constructions with the impersonal *si*, reflexive *si*, and adjectival phrase complements.

(4) a. PRO + NP

Udite hear.prs.2pl tucti all.mpl comunamente together come how Dio God omnipotente almighty.msg **si** se.3sg **lamenta** lament.prs.3sg chi who l' him ofende, offend.prs.3sg et and duramente harshly li them.m riprende reprimand.prs.3sg di of ciò that che which tucte all.fpl criature, creature.fpl segondo according.to le def.det.fpl loro their nature, nature.fpl connosceno know.prs.3pl lo def.det.msg lor their criatore creator.msg meglo better che than l' def.det.msg omo man.msg a at tucte all.fpl hore; hour.fpl (*Quindici segni del giudizio*, 1270)

'Hear all together how God Almighty laments those who offend him and harshly reprimands them with respect to the fact that all creatures according to their nature know their creator better than man at any hour;'

b. TR + PP Vedro=gli see.fut.1sg=him/them.m in in un indf.det.msg voler want tutti all.mpl dispor=si arrange.inf=si con with meco with.me a to **lamentar** lament.inf **della** of.def.det.fsg mie<sup>6</sup> my.fsg pena, trouble.fsg fin until che' that pianeti planet.mpl aràn have.aux.fut.3pl fatti make.pst.ptcp lor their corsi, course.mpl […]. [...] (*Rime varie*, 15th c.) 'I will see him/them all wanting to prepare themselves to lament my

trouble/s with me, as long as the planets are running their course [...]'

The sentences in (4a) and (4b) illustrate the diversity of constructions in the dataset, particularly in texts before the 16th century. The variability suggests that this is a period of change.

#### **4.2** *Ricordar(si)*

This verb pair has the meaning 'remember/remind'. Generally, with a null object complement, *ricordarsi* and *ricordare* have the definition of 'remember'. This use appears to have become a fixed expression, similar to the English 'as far as I remember'. With an indirect object (the addressee of the act of reminding) and a direct object, *ricordare* means 'remind/recall'. With only the direct object, it can also have the meaning 'recount, record', as for example in the context of historical events.

The general distribution for *ricordar(si)* is represented in Table 2. While both verbs select a finite complementizer phrase (usually introduced by *che* 'that') at similar frequencies, the most frequent complements of the TR and PRO verbs are the direct object and the indirect object, respectively.


Table 2: Syntactic environments of *ricordar(si)* (13th–21st)

6 [sic]

#### 4 The antipassive as a Romance phenomenon: A case study of Italian

Already the earliest data from the 13th show a similar pattern.The data records few cases of the intransitive construction with a null object complement for *ricordare* until the 19th century. The construction accounts for 23% of the TR constructions in the 21st century, however. The most notable change for *ricordarsi* is the increase in frequency of the NP complement, as seen in (5).

(5) PRO + NP

Ogni each volta time.fsg che that **si** se.3sg **ricorda** remember.prs.3sg quel that.msg nome name.msg glorioso, glorious.msg pieghi bend.prs.sbjv.3sg i def.det.mpl ginocchi knee.mpl del of.def.det.msg suo his.msg cuore; heart.msg (*Il Concilio di Lione II*, 1274)

'Each time that one remembers that glorious name, one bends the knees of one's heart.'

In contrast to the PRO + PP construction, the PRO construction in (5) maps onto the transitive construction *ricordare qualcosa a qualcuno*, meaning that the logical object is realized as an NP. It is already present in 13th-century data with two occurrences (4.65%) and increases gradually until it represents 38.1% of PRO constructions in the 21st century. It appears to be in competition with the PRO + PP construction, which decreases significantly in the last two centuries of the data and only represents 33.33% of PRO.

Some cases of *ricordare* selecting an indirect object complement are recorded in the earlier periods of the data, as seen in (6). The indirect object complement is marked in (6) by the partitive clitic *ne*, which refers to a PP headed by the preposition *di* 'of'.

(6) Ma but vendichi avenge.prs.sbjv.3sg alle to.def.det.fpl molte many.fpl volte time.fpl grandemente, greatly a at tal such otta time.fsg che that a to pena trouble.fsg ne of.it **ricorda** remind.prs.3sg a to chi who l' it.m ha have.aux.prs.3sg fattodo.pst.ptcp ma but a to noi us non neg esce leave.prs.3sg di from mente mind.fsg mai. never (*Il Libro de' Vizî e delle Virtudi*, 1292)

'But he [i.e., God] punishes harshly many times, at such a time that he who did it hardly remembers it, but we never forget.'

#### **4.3** *Vantar(si)*

In most cases, the PRO and TR verbs mean 'boast, praise'. In early texts, during the period of medieval courts and chivalry, *vantarsi* can have the meaning 'pledge (oneself)', which refers to a future exploit rather than a current or past one. As a pledge made to a person (or being), it follows the schema *vantarsi a* + person/being. This meaning is unique to the PRO verb, which underlines its close relationship with the reflexive use of se. An example is given in (7).

(7) PRO + NP

Ma but Parmenione Parmenione che that d' to adestrare ride.beside.inf Biancifiore Biancifiore a to casa house.fsg del of.def.det.msg novello new.msg sposo bridegroom.msg **s'** se.3sg **era** be.aux.ipfv.pst.3sg al to.def.det.msg paone peacock.msg **vantato**, boast.pst.ptcp [...] [...] con with Alcipiades Alcipiades [...], [...] e and con with alcuni some.mpl altri other.mpl giovani young.mpl nobili noble.mpl della of.def.det.fsg città, city.fsg [...], [...] al to.def.det.msg freno rein.msg di of Biancifiore Biancifiore vennero, come.pfv.pst.3pl [...]. [...] (*Filocolo* 4.163, 1336)

'but Parmenione, who had pledged before the peacock to ride beside Biancifiore to the bridegroom's house, [...], and so along with Alcibiades, [...], and other young nobles of the city, [...], he came up to Biancifiore's reins, [...].' (from Boccaccio 1985: 370)

The most common complements of *vantar(si)* are found in Table 3. In 90% of TR occurrences, *vantare* selects an NP complement, while *vantarsi* displays an array of complements. The most frequent is the non-finite complementizer phrase (40.60%), as in the expression *vantarsi di fare qualcosa* 'boast about doing something'. This is followed by the PP complement at 33.51%.

Table 3: Syntactic environments of *vantar(si)* (13th–21st c.)


#### 4 The antipassive as a Romance phenomenon: A case study of Italian

For both *vantare* and *vantarsi*, the dataset also revealed copula constructions, which are included in Table 3 in the category "Other". The attributes introduced in these constructions are nouns (e.g., *inventore del poema eroicomico* 'inventor of mock-heroic poetry'), past participles (e.g., *nato* 'born'), and adjectives (e.g., *opportuno* 'timely, ready'), as in (8). They appear to derive from elliptical constructions meaning 'boast about being' or 'pridefully claim to be'.

(8) Ecco here.it.is l' def.det.msg astuccio, case.msg Di of pelli skin.fpl rilucenti shining.fpl ornato adorn.pst.ptcp e and d' of oro, gold.msg Sdegnar disdain.inf la def.det.fsg turba, throng.fsg e and gli def.det.msg occhi eye.mpl tuoi your.mpl primiero first.msg Occupar hold.inf di of sua its.f mole: bulk.fsg esso it.m a to cent one.hundred usi usage.mpl Opportuno appropriate.msg **si** se.3sg **vanta**; boast.prs.3sg e and ad to esso it.m in in grembo, lap.msg Atta apt.fsg agli to.mpl orecchi, ear.mpl ai to.mpl denti, tooth.mpl ai to.mpl peli, hair.mpl all' to.fpl ugne, nail.fpl Vien come.prs.3sg forbita polished.fsg famiglia. family.fsg (*Il giorno*, 1763)

'I see, the throng disdaining with bulk that catches first thine eye, the case adorn'd with glossy skin and gold, whose boast is to be ready for a thousand needs, for in its lap a polish'd family it bears; apt are they for the ears, the teeth, the hair, the nails.' (from Parini 1977: 68)

From the 13th century on, the direct object is the most common complement of TR forms. There are some cases of finite and non-finite complementizer phrases throughout the dataset, but they are not frequent in general. By contrast, the indirect object and the null object are the most frequent PRO complements in the 13th century. The PRO construction involving the non-finite complementizer increases over time, from 15.79% in the 13th century to 37.50% in the 21st century. Interestingly, *vantarsi* is the verb most consistently used across centuries in the intransitive + null object construction.

#### **5 Discussion**

Constructions involving a logical object complement are an area of significant change between the 13th and 21st centuries and account for 49% of the dataset. There are two main ways that a logical object is encoded by these verbs, namely

as an oblique or as a direct object. In the former case, the PRO verb selects a PP, introduced by *di* 'of', as in (9a). In the latter case, the TR verb selects an NP complement, as in (9b). For one verb, *ricordarsi*, there is a third option, in which the PRO form selects an NP, as in (9c).

(9) a. PRO + PP

Il def.det.msg Signor Mr.msg Chiari Chiari **si** se.3sg **vantava** boast.ipfv.pst.3sg **d'** of uno indf.det.msg stile style.msg pindarico Pindaric.msg e and sublime; sublime.msg (*L'amore delle tre Melarance colle alusioni al Goldoni e al Chiari*, 1835)

'Mr. Chiari boasted of a Pindaric and sublime style.'

b. TR + NP

Una indf.det.fsg volta timefsg almeno at.least gli def.det.mpl Italiani Italian.mpl potevano can.ipfv.pst.3pl **vantare** praise.inf il def.det.msg bel beautiful.msg cielo sky.msg d' of Italia. Italy (*L'umiltà nazionale*, 1871)

'At least one time, the Italians were able to praise the beautiful sky of Italy.'

c. PRO + NP

e and benché although molti many.mpl intendano understand.sbjv.prs.3pl meglio better di of me me questa this.fsg materia, matter.fsg penso think.prs.1sg non neg di of meno less di of poter=ne can.inf=of.it significar express.inf il def.det.msg mio my.msg parere, opinion.msg e and tanto so.much più more quanto as.much.as **mi** se.1sg **ricordo** remember.prs.1sg il def.det.msg danno damage.msg che obj.rel averebbe have.aux.cond.3sg potuto can.pst.ptcp far=mi do.inf=me lo def.det.msg sfrenato wild.msg amor love.msg di of dir say.inf il def.det.msg vero, truth.msg di of che that non neg mi se.1sg son be.aux.prs.1sg pentito; repent.pst.ptcp

(*Della dissimulazione onesta*, 1641)

#### 4 The antipassive as a Romance phenomenon: A case study of Italian

'And although many understand this matter better than me, I think at least that I am able to express my point of view, and even more so since I remember the damage that the/my wild love for speaking the truth could have caused me, of which I have not repented.'

In (9a), the clitic *si* does not fit into the classifications offered by previous prescriptive and descriptive accounts. Its use is close to the "inherent *se*" or "inherently reflexive *se*", as described by Nishida (1994: 426) and by Medová (2009: 8), respectively, which are illustrated in (10) and (11):


[...] [...] **si** se.3sg **ricordò** remember.pfv.pst.3sg **del** of.def.det.msg sogno dream.msg e and corse run.pfv.pst.3sg in in gardino, garden.msg vicino near al to.def.det.msg fiume, river.msg dove where dormendo sleeping l' her aveva have.aux.ipfv.pst.3sg veduta. see.pst.ptcp

(*I racconti delle fate*, 1876)

'[...] he remembered the dream and ran into the garden, near the river where sleeping, he had seen her.'

Similar to *ricordarsi* in (12), the pronominal verbs in (10) and (11) display a pronoun *se/si* that cannot be interpreted as the object of the verb, direct or indirect. The difference between (10), (11) and (12) lies in the existence of a transitive counterpart for *ricordarsi*, i.e., *ricordare*. According to Medová (2009: 8), the inherently reflexive *se* distinguishes itself from the true reflexive by the absence of a corresponding transitive form. However, the expression in (12) cannot be interpreted as a true reflexive nor does it correspond to the characteristics of an inherently reflexive verb, with out a transitive counterpart. The process observed in (12) is

#### Karina High

the demotion of the logical object to a non-core argument along the hierarchy of grammatical roles. This is a characteristic of the AP construction, which may also entirely suppress the logical object. Examples of this may be found in the intransitive PRO constructions that are not true reflexives and do not select an overt complement.<sup>7</sup> Interestingly, the use of intransitive TR constructions decreases over time for *lamentare* and *vantare*, while the corresponding PRO constructions increase or stay consistent across time. While *ricordar(si)* features the intransitive construction less frequently overall, its frequency declines for *ricordarsi* and increases for *ricordare* in recent centuries.

Additional diagnostics, as described by Polinsky (2017), serve to better examine evidence for the AP. As is the case for AP constructions, the PRO verbs of this study are transitive in meaning, although they are syntactically intransitive (through the presence of *si*). In terms of morphology, the AP has bearings on case-marking, whereby the non-core status of the object is signaled by case inflection, e.g., an oblique case. Romance case inflection is greatly depleted since Classical Latin, but the use of prepositions increased and their functions were extended to cover functions previously fulfilled by the case system. As seen for (12), the non-core argument status is thus marked by the preposition *di*. Evidence from Chukchi (Chukotko-Kamchatkan: Russia) suggests that the logical object can also be left unexpressed without a great loss in meaning (Polinsky 2017: 7), which may explain the presence of PRO constructions in the dataset that follow the pattern PRO + Null and that are not true reflexives. As observed for other languages that exhibit an AP, it displays a type of "verbal affixation" (Polinsky 2017: 7), which may serve as a more general detransitivizing affix, found in other contexts marking reflexive/reciprocal, middle, passive, and aspect, among others. These characteristics are reflected in Romance se, which is similarly polyfunctional. It also functions as a marker of the anticausative, reflexive/reciprocal, etc. and detransitivizes transitive constructions.

The AP has a pragmatic effect in that it places the subject in a position of prominence, while demoting the object to a place of less prominence. This is called "subject prominence" or "agent foregrounding" (Polinsky 2017: 9). The prominence of the subject in the Italian pronominal verbs of this study is indicated not only by the demotion of the logical object, but also by the presence of the clitic *si*, which refers back to the subject, therefore highlighting its position. This concept is further analyzed below with respect to the mechanisms underlying the development of AP morphology.

<sup>7</sup> Such cases are coded to have a null object complement.

The historical perspective of this phenomenon is represented in Figure 2, which traces the distribution of the logical object for seven constructions across time.<sup>8</sup>

Figure 2: Distribution of logical object across time

The earliest periods of the corpus denote a stark contrast between the frequencies of the PRO verbs and the TR verbs. According to Figure 2, the PRO or AP construction is strongly preferred until at least the 17th century, when *vantare* + NP surpasses *vantarsi* + PP. In the same period, *lamentarsi* + PP continues to dominate at >90% and *ricordarsi* + PP, while still more frequent than *ricordare* + NP, continues its gradual decline that had started in the 15th century. This decrease of *ricordarsi* + PP is accompanied by an increase in *ricordarsi* + NP, which surpasses the AP construction in the 21st century, however. *Ricordare* + NP and *ricordarsi* + PP are at close to equal distribution in the 18th century and the TR verb stabilizes as the preferred construction. There is a sharp decline in *lamentarsi* + PP from the 18th to the 19th century, which consequently is less frequent than *lamentare* + NP for the first time. From the 19th until the 21st century, the abrupt changes in the trendlines point towards noisy or insufficient data; nonetheless, the trend started in earlier centuries continues – TR constructions are preferred for expressing logical objects. This is a considerable change from the 13th century, when the PRO constructions dominated at >70% of logical object expressions.

<sup>8</sup>Cases such as *lamentare* + PP, *lamentarsi* + NP, *ricordare* + PP, and *vantarsi* + NP are excluded from Figure 2, as they account for only 2% of the data. Also, the construction *vantare* + PP does not appear in the data.

#### Karina High

While the data convincingly display this change, it is harder to pinpoint the period in which these AP constructions may have developed. The AP constructions are already represented in the earliest stages of the corpus. However, the presence of the detransitivizing *si* signals a connection with the reflexive/reciprocal construction, which is a well-studied source of AP markers. Sansò (2017, 2019) identified the reflexive/reciprocal construction as one of four sources of the AP marker across a 120-language sample and terms it the "best-documented polysemy pattern involving AP constructions" (2017: 193). Creissels (2012) reconstructed the Proto-West-Mande suffix \*-i as the source of a detransitivizing suffix that grammaticalized into the reflexive pronoun *í* for Mandinka, among other Mande languages, and functions as an AP marker for some verbs (2012: 15). This is also observed and studied by Janic (2013) for Slavonic (specifically, Polish and Russian). An earlier paper by Terrill (1997), focusing on the development of AP in Australian languages, examined the diachronic processes by which the verbal morphology of reflexive constructions is reanalyzed and extended to AP constructions, first to a pragmatic AP and then to a structural AP. Her proposal sheds light on a sequence of mechanisms that underlie this change, which could account for the development found in this paper's Italian data.

As with the AP constructions in the Pama-Nyungan languages described by (Terrill 1997), the PRO verbs of this sample share verbal morphology with the reflexive construction. Terrill suggests that AP constructions develop from reflexive constructions via extension of their pragmatic function. Reflexive constructions display low transitivity; not only are they semantically and syntactically less transitive than the corresponding non-reflexive verbs, they also tend to have low-transitivity verbs, non-agent subjects, and "non-distinct" objects (Terrill 1997: 81), and their agent and object are coreferential. In AP constructions, the patient similarly has low prominence and the verbs display lowered transitivity, but the agent and object are not coreferential. By extending the verbal morphology from the reflexive environment to the AP environment, a similar pragmatic situation is maintained, although the agent and object are no longer coreferential. It is plausible that a similar mechanism operated in the extension of the function of reflexive *si* to the AP construction. Support for this is found in the dataset, as seen in (13), where there is an ambiguous reading between a reflexive – a woman (*ella*) bemoaning herself – or an AP – a woman grieving [someone]:

(13) [...] [...] onde so.that io I veggendo see.ger ritornare return.inf alquante some.fpl donne woman.fpl da from lei, her udio hear.pfv.pst.1sg dicere say.inf loro them parole word.fpl di of questa this.fsg gentilissima, gracious.sup.fsg com' how

ella she **si** se.3sg **lamentava**; lament.ipfv.pst.3sg (*La vita nuova*, 1292)

'Seeing some ladies come away from her I heard them describe Beatrice's lamentations [lit. how she grieved].' (from Alighieri 1964: 71)

The context of this excerpt reveals that the woman is grieving the death of her father and therefore provides evidence for an AP reading. Terrill also suggests that after a first extension of the reflexive to the similar pragmatic situation of the AP construction, its function is reanalyzed as both reflexive and AP. This is considered a pragmatic AP. In a third stage, a new construction emerges, the syntactic AP. It maintains the pragmatic AP's structure, but its pragmatic function becomes secondary, demoted by the structural function.

In this account of the development of the AP, the transition from one stage to another is facilitated by one or more shared characteristics—first pragmatic/semantic, then structural. The data appear to mirror this development. However, it does not explain the decrease in frequency of the AP construction with a logical object. This may be due to the loss of the pragmatic function, and the syntactic AP may have subsequently competed with the transitive construction. Despite this development, the constructions in which *lamentarsi* and *vantarsi* select a null object complement, as in (13), represent a significant percentage of each verb pair's occurrences in the 21st century, at 58% and 20.83% respectively. They may be remnants of AP constructions, which have been lexicalized. I also propose that *ricordarsi* + NP, which is already present in the 13th-century data and becomes more frequent than *ricordarsi* + PP in the 21st century, existed as a competitor to the AP construction. As the pragmatic function receded, the AP morpheme *si* may have been reanalyzed as a dative reflexive, a function which was already present at this point. This developed into the dominant PRO construction in the 21st century. However, this is an initial, tentative explanation of the diachronic processes triggering change in this dataset, which would require further data to answer more definitively.

The changing and at times ambiguous meanings of the verbs provide another perspective in this historical narrative. It is possible to find contexts in which the reading of *lamentar(si)* is ambiguous, as the boundary between lamenting oneself (implies inner torment or other suffering) and complaining (implies dissatisfaction) can be vague. Also, the act of remembering is almost inextricable from the (unconscious) act of reminding oneself of something. As for *vantarsi*, the more reflexive meaning of 'pledge (oneself)' in the 13th century may have provided the starting point from which the pragmatic AP construction developed. With

the presence of the reflexive meaning and the AP construction in the 13th c., it is at least possible to suggest that the emergence of the AP was underway.

#### **6 Conclusion**

In response to the questions laid out in the introduction, the PRO verbs and their TR counterparts display a great deal of variation in terms of their distribution across time and syntactic environments. The overall distribution reveals an important trend: the PRO forms are more frequent overall, but experience a decline starting in the 16th century for *vantarsi*. This development is reflected in the distribution of the logical object constructions, where constructions with PRO forms were preferred early on as well. However, the TR forms start to dominate from the 17th century onwards, which suggests a decline of the AP construction.

I propose three lines of inquiry that could deepen and broaden this study: analyzing the diachronic relationship between the semantic roles of these verbs and their argument structure, examining dialectal variation in Italo-Romance, and determining if there are similar patterns across Romance languages.

As suggested in §5, the low transitivity of *lamentar(si)*, *ricordar(si)*, and *vantar(si)* facilitated the extension of reflexive verbal morphology to the AP construction. This is reflected in the semantic roles of these verbs: Experiencer subjects with low agentivity (and volition) and Theme objects that are little or not affected by the action. Change over time of semantic roles could account for diachronic variation of constructions and support the proposal for the emergence of the AP construction.

For the purpose of this study, I excluded data that presented prominent non-Tuscan features. A further study could include these data and examine the extent to which interdialectal contact shapes the use and distribution of the AP construction. The presence of verb pairs with similar characteristics in other Romance languages, such as French *(se) vanter (de)* 'praise, boast', suggests that a similar pattern might exist more broadly in Romance. It remains to be examined if the AP construction affects the same classes of verbs and if its distribution follows a comparable trajectory across time. Additional diachronic studies in other Romance languages examining these constructions could provide further evidence in favor of the AP construction as a Romance phenomenon, while also tracing its emergence back to a common source. This would be a valuable contribution to the historical research of the AP, which has tended to focus on ergative languages and other accusative languages.

### **Abbreviations**


### **Acknowledgements**

I wish to thank Cinzia Russi for her helpful feedback on data and drafts of this article, as well as the three anonymous reviewers for their valuable comments. Any errors remain my own.

### **Corpora**

Opera del Vocabolario della lingua Italiana (OVI, https://artfl-project.uchicago. edu/content/ovi).

Corpus Tesoro della lingua Italiana delle Origini (TLIO, http://tlioweb.ovi.cnr.it/ (S(nir13gk3cbh0jfugkf2xow3a))/CatForm01.aspx).

Corpus di italiano scritto (CORIS/CODIS, http://corpora.dslo.unibo.it/coris\_ita. html).

Biblioteca dei Classici Italiani (http://www.classicitaliani.it/).

IntraText Digital Library (IntraText, http://www.intratext.com/).

Corpus Diacronico dell'Italiano Scritto (DiaCORIS, http://corpora.dslo.unibo.it/ DiaCORIS/).

#### Karina High

#### **Primary Sources**

Accetto, Torquato. 1641. *Della dissimulazione onesta*. (IntraText) Alighieri, Dante. 1292. *Vita nuova*. (OVI) Alighieri, Dante. 1964. *The new life. La vita nuova. Translated with an introduction by William Anderson.* Trans. by William Anderson (Penguin classics). Baltimore: Penguin Books. Boccaccio, Giovanni. 1336. *Filocolo*. (IntraText) Boccaccio, Giovanni. 1985. *Il Filocolo / Giovanni Boccaccio; translated by Donald Cheney with the collaboration of Thomas G. Bergin.* Trans. by Donald Cheney and Thomas G. Bergin. New York: Garland Pub. Collodi, Carlo. 1876. *I racconti delle fate*. (Biblioteca dei Classici Italiani) Collodi, Carlo. 1871. *L'Umiltà nazionale. Il Fanfulla*. (Biblioteca dei Classici Italiani) da Barberino, Andrea. 1491. *I reali di Francia*. (IntraText) Giamboni, Bono. 1292. *Il Libro de' Vizî e delle Virtudi*. (IntraText) Giamboni, Bono. 2013. *Le Livre des vices et des vertus*. Trans. by Sylvain Trousselard & Elisabetta Vianello (Textes littéraires du Moyen Âge 23). Paris: Classiques Garnier. Giambullari, Bernardo. 15th. *Rime varie*. (Biblioteca dei Classici Italiani) Goldoni, Carlo. 1760. *I Rusteghi*. (Biblioteca dei Classici Italiani) Goldoni, Carlo. 1961. *Three comedies*. Trans. by Clifford Bax, I.M. Rawson, Eleanor Farjeon & Herbert Farjeon (Oxford library of Italian classics xxvii, 293 p.) London: Oxford University Press. Gozzi, Carlo. 1761. *L'amore delle tre Melarance colle alusioni al Goldoni e al Chiari*. (Biblioteca dei Classici Italiani) Papa Gregorio X. 1274. *Secondo Concilio di Lione*. (IntraText) Parini, Giuseppe. 1763. *Il giorno*. (IntraText) Parini, Giuseppe. 18th. *Il primo bacio*. (Biblioteca dei Classici Italiani) Parini, Giuseppe. 1977. *The day: morning, midday, evening, night: a poem*. Trans. by H. M. Bower. Westport, Conn.: Hyperion Press. Salvatorelli, Luigi. 1943. *Pensiero e azione del risorgimento*. (DiaCORIS) Anon. 1270. *Quindici segni del giudizio*. (OVI)

#### **References**


#### Karina High


## **Chapter 5**

## **The role of SE in Spanish agreement variation**

Irene Fernández-Serrano<sup>a</sup>

<sup>a</sup>Universitat Autònoma de Barcelona

This paper analyses Spanish agreement variation in non-paradigmatic SE structures. It is argued that in European Spanish the attested alternation between agreement and lack of agreement is part of a single grammar, i.e. a case of intra-speaker optionality. To support this claim it is shown that neither definiteness nor Case assignment are responsible for the lack of agreement pattern. The proposal combines two basic ingredients: the special featural configuration of SE (Mendikoetxea 1999, D'Alessandro 2008) and the parametrization of the order of syntactic operations (Obata et al. 2015, Obata & Epstein 2016). This analysis reflects the asymmetries with respect to Italian and Icelandic data and is compatible with a similar case of variation in Spanish dat-nom psych-verb structures.

#### **1 Introduction**

This paper investigates lack of agreement in Spanish SE structures when the Internal Argument (IA) is not animate (and therefore does not require Differential Object Marking (DOM)<sup>1</sup> ) and remains in postverbal position.<sup>2</sup> The basic asymmetry is presented in (1):

<sup>2</sup>As different authors have noted (see Ortega-Santos 2008 and references therein) number mismatches with postverbal subjects are pervasive across languages. I leave this matter aside since it is beyond the scope of this paper.

Irene Fernández-Serrano. 2023. The role of SE in Spanish agreement variation. In Barbara E. Bullock, Cinzia Russi & Almeida Jacqueline Toribio (eds.), *A half century of Romance linguistics: Selected proceedings of the 50th Linguistic Symposium on Romance Languages*, 85–106. Berlin: Language Science Press. DOI: 10.5281/zenodo.7525100

<sup>1</sup>The requirements for DPs to be DOM-marked in Spanish are much more complex and still subject to debate (see Leonetti 2004, Rodríguez-Mondoñedo 2007, López 2012 among many others).

(1) a. Agreeing SE

Se SE discutieron discuss.3pl.pst los the resultados. results.

'The results were discussed / Someone discussed the results.'

b. Non-agreeing SE Se SE discutió discuss.3sg.pst los the resultados. results. 'The results were discussed / Someone discussed the results.'

In (1a) we see the "standard" pattern that I refer to as "agreeing SE",<sup>3</sup> where the verb *discutieron* 'discussed' agrees with the IA *los resultados* 'the results'. This structure is subject to variation, reflected in (1b), where there is a lack of agreement: the verb *discutió* shows 3rd singular inflection despite the fact that the IA is plural.

The goal of this paper is to give a syntactic analysis for the asymmetry exemplified in (1). I defend that this analysis has to account for two main aspects of the phenomenon: (i) intra-speaker optionality and (ii) number/person agreement asymmetry. While (ii) has been widely explored and discussed in the literature, (i) is a novel hypothesis.

For (i), I maintain that the attested non-agreeing pattern (Raposo & Uriagereka 1996, D'Alessandro 2008, Mendikoetxea 1999, Ormazabal & Romero 2019, Sánchez López 2002, among many others) freely alternates with its agreeing counterpart and that this alternation belongs to a single grammar. To defend this idea, I provide evidence from oral interviews extracted from a corpus that shows that a single speaker may produce one pattern or another indistinctly. Syntactically, I argue that there are no specific properties, such as the shape of the IA, responsible for the lack of agreement.

Regarding (ii), it has been shown that the asymmetry only affects number, since person agreement is always banned in SE structures (López 2007). Previous analyses have related this fact to a person restriction on the IAs by means of a Multiple Agree or similar mechanisms (D'Alessandro 2008, López 2007). However, I show that these approaches do not capture the fact that strong pronouns are always banned in Spanish SE structures, as opposed to other languages such as Italian or Icelandic where the restriction only holds for 1st and 2nd person pronouns.

<sup>3</sup> I want to highlight that I shorten the whole terms "agreeing SE pattern" and "non-agreeing SE pattern." I do not want to imply with these labels that the asymmetry relies on agreement with the clitic.

The gist of the analysis is not new: it is based on defective intervention effects (Chomsky 2001) created by an oblique element that can be avoided via movement (Sigurðsson 1992, Sigurðsson & Holmberg 2008). The novelty is that Spanish SE has not been treated as a intervener before, neither as a case of intra-speaker optionality. In particular, I follow the idea that this optionality is part of syntax (Biberauer & Richards 2006) and adopt Obata et al.'s (2015) and Obata & Epstein's (2016) proposal that different grammatical outputs are explained by the timing of syntactic operations.<sup>4</sup> I keep the intuition that the specific featural configuration of SE has an impact on agreement, mainly blocking the possibility of person agreement with the IA. The new twist is that in Spanish the number asymmetry may be explained via cliticization, since Multiple Agree does not capture the impossibility of strong pronoun licensing.

The paper is structured as follows: §2 introduces SE structures and the variation data. In §3, I present in more detail the puzzles that SE presents for Case and Agreement, and I review some previous approaches. §4 focuses on the proposal and a possible extension to dat-nom Spanish psych-verbs agreement variation. §5 concludes the paper.

#### **2 The data**

#### **2.1 SE structures in Spanish**

Non-paradigmatic *se* (SE) has been considered a defective clitic in Spanish because it does not have number or person inflection.<sup>5</sup> It has been described as 3rd singular clitic, but its exact -featural content differs among proposals (see Torrego 2008 and references therein). Here, I analyse SE as a clitic with a valued 3rd person feature and an underspecified number feature, following D'Alessandro (2008).

SE structures are characterized by showing only one overt DP in IA position, which traditionally has been considered to be the subject when there is agreement (6a), or the direct object, when there is no agreement. These configurations have been called "passive SE" and "impersonal SE" respectively (see Mendikoetxea 1999, Sánchez López 2002 and references therein).

<sup>4</sup>This is a formulation within the Minimalist Program of an old idea already present in the P&P framework (for instance in treatments of wh- or verb movement). See Georgi (2014) and ref. therein.

<sup>5</sup> I focus on a very specific instance of *SE*. Since it is well-known that this clitic is involved in a varied range of structures (reflexive, inchoative, aspectual, etc.), not only in Spanish but in Romance languages in general (see for instance Mendikoetxea 2012 and MacDonald 2017).

I adhere to the perspective of more recent proposals that cast doubts on a double derivation and consider that there exists only one SE structure with different agreement outcomes (Pujalte & Saab 2014, Ormazabal & Romero 2019, Gallego 2016, 2019). There are at least two main reasons for supporting this view. On the one hand, regarding interpretation, in current Spanish both patterns are virtually equivalent: SE "absorbs" the External theta-role (Cinque 1988) and the subject gets an "arbitrary" or "indefinite" interpretation (Raposo & Uriagereka 1996, among others).<sup>6</sup> The comparison between a regular transitive sentence and a SE sentence is exemplified in (2):


b. Se SE discutieron/discutió discuss.3pl.pst los the resultados. results. 'The results were discussed / Someone discussed the results.'

On the other hand, as I show in the next sections, there seems to be no specific syntactic conditions that lead us to think that there exist two different derivations. Before moving to the theoretical aspects, I introduce variation data in more detail.

#### **2.2 Variation: non-agreeing SE**

In this section, I present the variation data and defend the hypothesis that for some speakers, agreement with the IA in SE structures is optional. In particular, I show that not only are the two variants used by one speaker, but they also show no asymmetries in syntactic properties, such as verb aspect or definiteness of the IA.

In Arias & Fernandez-Serrano (In press), we highlight that the precise distribution of non-agreeing SE remains a mystery, even though the phenomenon has traditionally been present in the literature on Spanish grammar (see Sánchez López 2002). Consider for instance the examples reported in ALPI (*Atlas Lingüístico de la Península Ibérica*):

<sup>6</sup> It is beyond the scope of this paper to focus on the semantics of SE structure; however, it is important to note that roughly speaking, in SE structures, the subject is either unknown by the speaker or the speakers do not want to make it explicit. In English, the closer translation is by the pronoun 'one', although I also use passives or an arbitrary 'they' in the examples.

	- a. Se SE necesita need.3sg obreros. workers. 'Workers are needed.'
	- b. Se SE vende sell.3sg patatas. potatoes. 'Potatoes are sold.'
	- c. En In el the huerto orchard se SE podía could.3sg plantar plant.inf rosales. rose-bushes. 'In the orchard rose bushes could be planted.'

The fact that non-agreeing SE has been normatively banned and considered an error from oral speech may be responsible for the lack of a comprehensive empirical study and, at the same time, prove that the phenomenon exists and is pervasive to some extent (Sánchez López 2002: 36). By way of illustration, the works that mention the distribution generally describe it as more frequent in American varieties (Mendikoetxea 1999 and references therein) without further specification. The rich linguistic diversity of American varieties makes it dubious that there are no specific aspects such as language contact that may influence the phenomenon. I am not in a position of filling this gap, but it is worth noting that data from new corpora challenge some of the traditional ideas. Let me sketch two of them.

Firstly, the incidence of the phenomenon in different sociolinguistic contexts should be explored. In particular, I want to highlight that there are plenty of examples from the press (see Arias & Fernandez-Serrano In press), suggesting that it is not so straightforward to relegate the phenomenon to oral speech. Some of these examples from different dialectal areas are shown in (4):

	- a. Mexico

Se SE descubrió discover.3sg.pst las the verdaderas real causas reasons de of su his renuncia. resignation. 'The real reasons for his resignation were discovered.'

b. Venezuela

Aún yet no no se SE tiene have.3sg datos data.pl específicos specific de of los the daños. damages. 'There are no specific data from the damages yet.'

c. Bolivia

En in la the propuesta proposal técnica technical se SE consideró consider.3sg.pst estos these aspectos. aspects. 'These aspects were not considered in the technical proposal.'

d. Argentina

Hasta until el the día day de of hoy, today no no se SE sabe know.3sg las the causas reasons exactas exact de of su his muerte.

death.

'To this day the exact cause of his death is not known.'

Secondly, some syntactic properties have been identified as being responsible for lack of agreement. One of them is the degree of definiteness of the DP, NPs being the most likely to appear in non-agreeing SE (Mendikoetxea 1999: 1677). Notice that the examples in (4a), (4c) and (4d) already challenge this point (see also DeMello 1995). In the same vein, imperfective verbal aspect has been pointed out as favouring agreement (Mendikoetxea 1999: 1678). Again, the examples in (4a) and (4c) prove that even if this is a tendency, the phenomenon is possible with perfective verbal aspect.

Even if the prevailing view is on the right track, meaning that the phenomenon is more extended in American varieties, that does not preclude that it is not present in European Spanish, as the examples in (3) reveal. This is corroborated by the data collected in COSER (Fernández-Ordóñez 2005). This corpus contains transcriptions of interviews with elder speakers from rural areas of Spain. Consider (5):

	- a. no no se SE echaba put.impfv.3sg esos those compuestos compounds que that se SE echan put.3pl en in la the comida food 'They(arb) didn't put those compounds that are put in the food'
	- b. Y and con with manteca, butter se SE hacía make.impfv.3sg unas some gachas oatmeal y and eso that alimenta…

feeds

'And with butter, they(arb) made oatmeal and that is nourishing'

c. También also se SE cultiva cultivate.3sg muchas much cebollas onions 'A lot of onions were also cultivated'

d. se SE corta cut.3sg los the trozos pieces gordos big y and aluego then se SE hacen do.3pl trocitos pieces 'It is chopped in big pieces and then little pieces are made'

Examples (5a) and (5d) are especially relevant since they contain agreeing and non-agreeing SE, illustrating that the same speaker may alternate between both options. Its also worth pointing out that these examples differ from the ones from ALPI in (3) in that they are instances of spontaneous speech. This is most likely the reason why all the examples from ALPI contain a bare NP, while we see that in oral speech examples with DPs are also possible. In fact, only one third of the non-agreeing SE examples gathered from the corpus contained bare NPs (Arias & Fernandez-Serrano In press).

This evidence aligns with the facts indicated above about the data in (4) from American varieties. However, to avoid the risk of overgeneralization, I am going to consider only the last pieces of evidence for my analysis; that is to say, I restrict my proposal to oral European Spanish data. In sum, the key aspect that I attempt to reflect is that there are two possible agreement outcomes in SE contexts that can freely alternate regardless of the aspect of the verb or the shape of the IA.

#### **3 The puzzle of SE**

#### **3.1 Agreement**

This section reviews the theoretical challenges that SE structures present regarding agreement and Case. The basic asymmetry is reminded in (6) below. In (6a), there is number agreement between T(ense) and the IA *los resultados* 'the results'; while in (6b), the verb is inflected in 3rd person singular, thus there is no agreement with the IA.

(6) a. Agreeing SE Se SE discutieron discuss.3pl.pst los the resultados. results. 'The results were discussed / Someone discussed the results.' b. Non-agreeing SE Se SE discutió discuss.3sg.pst los the resultados. results. 'The results were discussed / Someone discussed the results.'

The first question about these patterns concerns what -features are involved in agreement. It is clear that there is an asymmetry in number agreement between (6a) and (6b), whereas there is no apparent problem regarding person: the IA is 3rd person singular and the verb shows 3rd person singular morphology. However, this point needs a more careful consideration, since, as it is wellknown, 3rd person morphology may reflect default valuation (see Preminger 2014, among many others).

The following examples show that person verbal morphology is not possible in SE contexts (7a) as opposed to non-SE (regular active) contexts (7b):

	- a. \* Se SE vimos see.1pl.pst unos some lingüistas linguists en in el the mercado market ayer. yesterday. Intended meaning: 'Some of us linguists were seen in the market.'
	- b. Unos some lingüistas linguists vimos/visteis/vieron see1pl/2pl/3pl.pst un a gran big melón. watermelon. 'Some (of us/of you) linguists saw a big watermelon.'

As we see in (7b), a person mismatch between the verb and the 3rd person subject is allowed in Spanish,<sup>7</sup> but that option is excluded with SE. López 2007 takes this as evidence that the IA must be 3rd person or, in other words, that it can never be 1st or 2nd person.

This constraint, known as "person restriction" (Burzio 1986), has typically been described for Italian (D'Alessandro 2008, Pescarini 2018) and for Icelandic quirky subject structures (Sigurðsson 1992 and following work):

	- a. \* In in televisione television si SE vediamo see.1pl spesso often noi. we. 'One often sees us on TV.'
	- b. In in televisione television si SE vediamo see.1pl spesso often Maria/lui. Maria/he. 'One often sees Maria/him on TV.'

<sup>7</sup>This phenomenon is referred to as "unagreement", "anti-agreement" or "disagreement", see Höhn (2015) for a detailed overview.

	- a. \* Honum he.dat líkið like.2pl Þið. you.pl. 'He likes you.'
	- b. Honum me.dat líka like.3pl.pst Þeir. they. 'He likes them.'

In Italian, the person restriction arises precisely in SE (*si* in Italian) contexts as shown in (8). From the comparison with this language (see (10)–(11) below), it can be concluded that such restriction does not hold for Spanish SE since third person pronouns are also banned (Ordóñez & Treviño 2016: 248):<sup>8</sup>

	- a. Lui He si SE vede see.3sg spesso often in in televisione. television. 'One often sees him on TV.'
	- b. \* Tu You si SE vedi see.2sg spesso often in in televisione. television. 'One sees you often on TV.'
	- b. \* Se SE ve/s see.3sg/2sg tú you a menudo often en in televisión television 'One sees you often on TV'

Different authors have considered that this restriction is due to the fact that the Probe, T, establishes a relationship with both the IA and the clitic (D'Alessandro 2008, López 2007). This is possible by means of a Multiple Agree (Hiraiwa 2001) link, where one Probe is able to agree with more than one Goal:

<sup>8</sup>Rivero (2004) shows that certain contexts in Spanish, structures involving specific psychological verbs with inherent *SE* morphology seem to be subject to the person restriction. She argues against a Multiple Agree perspective to account for such scenarios, which I leave aside in this paper.

	- α > β > γ

As argued by D'Alessandro (2008), the Multiple Agree approach correctly captures the person restriction found in Icelandic and Italian, following a condition on feature specification (Anagnostopoulou 2005). This condition bans Multiple Agree if the two Goals do not share the same feature value. In this case, since SE is 3rd person, the DP must also be 3rd person for the derivation to converge (see D'Alessandro 2008: §3 for a detailed discussion).<sup>9</sup>

Since, as we saw in (11), there is no person restriction in Spanish SE, an alternative analysis to Multiple Agree should be considered. In this sense, I suggest that Spanish nom pronouns require a -relationship with a Probe, which is prevented in SE contexts. In non-agreeing SE, T does not reach the IA at all, while in agreeing SE, there is only number agreement and consequently, pronouns can not be licensed. This will become more clear in §4.

#### **3.2 Case**

Non-agreeing SE (traditionally considered "impersonal SE") has been analysed by arguing that T agrees either with SE (Raposo & Uriagereka 1996, López 2007, Pujalte & Saab 2014, Ormazabal & Romero 2019) or with a null *pro* (Otero 1986, Cinque 1988, Bosque & Gutiérrez-Reixach 2009, Torrego 2008), while the IA is assigned acc Case. In this section, I present evidence, following ideas by Ordóñez & Treviño (2016), that challenge the assumption that the IA of non-agreeing SE is always acc.

There are only two tests to check for acc in Spanish: the presence of DOM ("a" preceding animate objects), and the possibility of paraphrasing the argument by an acc clitic (pronominalization). In SE contexts, DOM is compulsory when the IA is a definite+animate argument as in (13a).<sup>10</sup> In the case of pronouns, they must be also DOM-marked, but the phrase can be dropped since there is obligatory clitic doubling, as we see in (13b):

(13) a. Se SE ve see.3sg \*(a) dom María Maria en in televisión. television. 'One sees Mary on TV.'

<sup>9</sup> In López's (2007) system, the Probe targets a Complex Dependency that has been previously formed between the two Goals. This dependency requires a similar condition on feature coincidence than the one on Multiple Agree, hence I do not treat these proposals separately. <sup>10</sup>see fn. 1

b. Se SE \*(te) you.acc ve see.3sg (a dom ti) you.obl en in televisión. television. 'One sees you on TV.'

DOM seems a robust test for supporting that the IA of non-agreeing SE is acc. However, some authors challenge this evidence by showing that in SE contexts, DOM arguments are prone to be pronominalized by a dat clitic (*le*), instead of an acc (*lo*), even in non-leísta dialects (Mendikoetxea 1999, Ordóñez & Treviño 2016).<sup>11</sup> Consider the following data from Mexican Spanish where there is no *leísmo*, as (14) shows, but the dative clitic appears in combination with SE:

	- a. **A** dom Juan/Sara Juan/Sara **lo**/**la** cl.acc.3msg/cl.acc.3fsg vieron see.3pl.pst cantando. singing. 'They saw Juan/Sara while s/he was singing.'
	- b. **A** dom Juan/Sara Juan/Sara *se* SE *le* cl.dat.3sg vio see.3sg.pst cantando. singing. 'One saw Juan/Sara while s/he was singing.'

Let me now show what happens when the IA does not accept DOM-marking. The only test that allows us to do that is pronominalization. If these arguments were assigned acc we would expect pronominalization to be possible. However, as different authors have pointed out (Torrego 2008, Ordóñez & Treviño 2016), this does not seem to be the case:<sup>13</sup>

	- a. \* Esos those libros books se SE los/les acc.3mpl/dat.3pl prohibió prohibited.3sg en in el the franquismo. franquismo. 'Those books were banned during Franco years.'
	- b. Torrego (2008: 788), from Ordóñez (2004) \* El the arroz, rice se SE lo acc.3mpl come eat.3sg cada every domingo Sunday en in este this hostal. hostel. 'In this hostel they eat rice every Sunday.'

<sup>11</sup>"Leísta" speakers use the dative clitic *le* instead of the accusative clitic when the object is animate (Fernández-Ordóñez 1999, among many others).

<sup>12</sup>Boldface and italics in original, translation is mine.

<sup>13</sup>To some members of the audience of LSRL50 wondered about the factors that make (15b) worse than (15a). I agree with this judgement, but the reason for this contrast is not obvious.

Note that the IA, here fronted, is not repleaceable by either an acc or a dat clitic (15b). The conclusion is that the IA cannot be a clitic, regardless of whether the speaker is "leísta" or speaks a dialect in which the "se le" effect of (14b) applies.<sup>14</sup>

In sum, in this section I have outlined two puzzles that non-agreeing SE presents. Firstly, that person agreement is never possible and that there is no person restriction on the IA, meaning that all nom pronouns are banned. Secondly, the IA does not seem to receive acc Case, especially when the IA is a non-DOM object. This evidence leads me to suggest an analysis where the IA in non-agreeing SE structures gets valuation by default since there is no relationship, either regarding Case or agreement with T (or *v*).<sup>15</sup>

#### **4 Proposal: SE intervention and how to avoid it**

In this section, I formulate an analysis that takes into account the pieces of the puzzle outlined so far. The gist of my proposal for non-agreeing SE is that the clitic creates an intervention effect that blocks agreement with the IA. The optional alternation with the agreeing version follows from applying a relative timing of AGREE and MOVE (Obata et al. 2015, Obata & Epstein 2016). This analysis becomes more transparent when comparing SE contexts with the variation found in dat-nom configurations in Spanish, to which I devote the last part of the section. Let me explore these points in turn.

In the previous sections, I have highlighted that lack of agreement is not a consequence of the specific shape of the IA when it is not animate. This claim allows us to discard an analysis based on definiteness effects (see for instance Belletti 1988) and leads us to think that there is another factor that prevents the Probe from reaching the expected Goal. The hypothesis that I put forward here is that it is SE itself that creates this "barrier".

<sup>14</sup>A reviewer wonders if DOM may be a precondition for the appearance of the clitic, which would be coherent with an analysis of DOM à la López (2012). In fact, Mendikoetxea (1999) already notes that pronominalization of non-animate IAs seems to be favoured if they are DOM-marked, although some examples with non-DOM objects are also attested. For the latter, Ormazabal & Romero's (2019) analysis can be adopted, by which the clitic is the spell-out of agreement between T and the IA. What is important for the main discussion is that neither of those cases seem to support that the IA gets acc Case.

<sup>15</sup>For the sake of simplicity, I am not considering the role of *v\**. For these type of structures that show a T-IA relationship, it can be assumed that *v* is -defective, meaning that it does not trigger Spell-Out, or assign Case to its complement (Chomsky 2001).

This type of blocking effect has been extensively studied in the literature, known as "defective intervention" (Chomsky 2000). This effect arises when an element is visible for the Probe's search but is not a suitable Goal itself:

(16) Defective Intervention (Hiraiwa 2001: 69)

$$
\begin{array}{c}
\mathsf{X} < \mathsf{\beta} < \mathsf{\alpha}\_\* \\
\mathsf{\mathsf{\beta}} < \mathsf{\beta} < \mathsf{\alpha}\_\* \\
\mathsf{\mathsf{\beta}} < \mathsf{\beta}\_\*
\end{array}
$$

As we see in (16), where > stands for c-command, if the Probe α finds the intervener β before reaching the Goal γ, β prevents agreement between α and γ. In SE contexts, β is the clitic SE and γ the IA.

The paradigmatic example of this effect is found in Icelandic quirky subject configurations, where the dat argument blocks agreement with the nom IA in biclausal contexts (Sigurðsson 1992, Boeckx 2000, Holmberg & Hróarsdóttir 2003, Sigurðsson & Holmberg 2008, Preminger 2014).

	- a. Mér me.dat virðast seem.3pl [hestarnir the-horses.nom vera be seinir]. slow. 'It seems to me that the horses are slow.'
	- b. Það expl virðist/\*virðast seem.3sg/\*seem.3pl einhverjum some manni man.dat [hestarnir the.horses.nom vera be seinir.] slow.

'A man finds the horses slow.'

As we see in (17b), when the dat is*in situ*, between the verb and the nom, there is lack of agreement; while this is avoided in (17a), where the dat has raised above the verb. Sigurðsson & Holmberg (2008) show that these types of agreement configurations are also subject to variation in Icelandic, distinguishing three dialects: one with only agreement; a second one with only lack of agreement; and a third where speakers accept both variants. They propose a competing grammars approach for the latter, since speakers seem to alternate between the other two dialects.

There is a crucial difference between Icelandic variation and the one I present here, namely, there seems to be no dialect in Spanish where lack of agreement is always compulsory. In this sense, I want to defend the possibility of a true optionality within the same dialect (grammar), following Biberauer & Richards (2006). Consequently, I am not arguing that there are two distinct dialects or that a speaker can be bi-dialectal, but that both outcomes of agreement are possible as part of the same grammar.<sup>16</sup>

Coming back to the specifics of SE configurations, remember that I assume that SE is endowed with a valued 3rd person feature and a underspecified number feature (D'Alessandro 2008). The crucial idea is that SE does not lack the feature number, and therefore, number is visible for the Probe. My hypothesis is that SE triggers the same effect as the dat and makes it impossible for T to reach the IA, as we see in (18):<sup>17</sup>

(18) Non-agreeing SE

A direct consequence of this analysis is that, if there is no T-IA relationship, the IA cannot be assigned Case. However, I would like to suggest that this is not necessarily a negative consequence, but it correctly predicts the asymmetry in licensing between non-animate DPs and pronouns that I presented in section 3.1. (see (11) repeated here as (19)).

	- b. \* Se SE ve/s see.3sg/2sg tú you a menudo often en in televisión. television. 'One sees you often on TV.'

It is an old observation that in Spanish and in other Romance languages such as Catalan, person agreement is the key licensing factor for nom, which makes the appearance of NOM pronouns impossible in *SE* structures (Bianchi 2001, 2003, Rigau 1991). The question is then how non DOM-marked IAs of SE sentences are licensed.<sup>18</sup> Either they are licensed by other means, such as focus, (Belletti 2001, Rosselló 2000, Etxepare & Gallego 2019, among others) or they receive another non-morphologically realized Case from *v*. The latter option seems more

<sup>16</sup>An important matter left for future research is what is the role of preferences in grammar since agreeing SE is much more frequent than non-agreeing SE.

<sup>17</sup>The exact position where SE is first-merged is tangencial to my discussion. I assume that it is c-commanded by T and higher than the IA. It can be either an EA position in Spec,*v* (following Raposo & Uriagereka 1996, D'Alessandro 2008, Torrego 2008) or heading Voice (see MacDonald 2017 and ref. therein)

<sup>18</sup>We assume here the analysis of Ordóñez & Treviño (2016: 248) that DOM objects receive inherent Case.

promising considering the parallelism with QS structures. A specific flavour of *v*, capable of assigning nom has been argued to appear in such structures (Boeckx 2008, López 2007, Gallego 2018).

Turning now to agreeing SE, it is plausible that in this case the intervention effect is avoided exactly as in Icelandic, i.e. by the movement of the intervener. Note that SE must cliticize on T (Cinque 1988, D'Alessandro 2008); therefore, SE must always raise.<sup>19</sup> If we assume that rule-ordering is not predetermined, and that MOVE and AGREE may feed one another indistinctly (see Georgi 2014), two outcomes are possible. Agreeing SE is the result of SE cliticizing on T before Agree takes place (see (20)), whereas in non-agreeing SE cliticization happens after Agree (cf. (18) above).

(20) Agreeing SE SE-T …<SE> …IA

In sum, I propose that both (20) and (18) are possible derivations in a speaker's grammar, the only difference between them is the order of the operations MOVE and AGREE (Obata et al. 2015, Obata & Epstein 2016).<sup>20</sup> Find the asymmetry summarized in (21):

	- b. Non-agreeing SE: AGREE > MOVE

It remains to be discussed how this analysis accounts for the impossibility of person agreement in SE contexts (see §3.1). My hypothesis is that the specific featural configuration of SE is again responsible for this behaviour. When SE cliticizes on T before Agree, the person feature of SE values the person feature of T. This is not possible with number since the underspecified number feature of SE cannot provide any value. Consequently, when T probes, it only needs to check number with the IA. The steps are schematized in (22):

<sup>19</sup>A reviewer wonders about infinitive contexts in which SE is enclitic. This question, although very relevant, requires a more detailed description of the appearance of SE in bi-clausal contexts which for reasons of space I cannot develop here (see Mendikoetxea 1999: 1705–1715; Sánchez López 2002: 43–49).

<sup>20</sup>A reviewer is worried about this kind of analysis in that it revamps the notion of extrinsic rule-ordering. It is indeed an old idea that goes back to Chomsky (1965), but it is not clear to me that it should not have a role in current Minimalist theories. Georgi (2014) provides a very thorough argumentation about the topic and convincingly shows that intrinsic ordering cannot predict all attested orderings of Merge and Agree to which she concludes that grammar requires both types of rule-ordering. It is also interesting to point out that OT is based on extrinsic rule ordering (as Georgi 2014: 253 also notes), which is not incompatible with the Minimalist framework (see Broekhuis & Woolford 2010).


This possibility raises non-trivial questions about cliticization that must be addressed in the future.

#### **4.1 Extension: non-agreeing dat structures**

Before I conclude, I want to extend my analysis to a similar case of agreement variation in Spanish found in dat-nom contexts. As argued in Arias & Fernandez-Serrano (In press), some Spanish speakers optionally obviate agreement (see (23b)), otherwise compulsory, (see (23a)) between the verb and the IA of datnom psych-verb structures.

	- b. No no me me.dat gusta like.3sg las the funerarias funeral-homes 'I don't like funeral homes'

In this context, analyzing the lack of agreement in (23b) as a result of defective intervention is more straightforward since the intervener is a dat argument, exactly as is the case in Icelandic (see (17)). The difference with this language is, again, that Spanish does not show a person restriction on the IA, but a ban on nom pronouns when there is no agreement, as we see in (24b):

	- b. \*Le him/her.dat gusta like.3sg tú/nosotros/ellos. you.sg/we/they. 'S/he likes you/us/them.'

Therefore, the same analysis presented for SE can be adopted for this case, as (25) reflects:

(25) a. Agreement: MOVE > AGREE dat …T …<dat> …IA

b. Lack of agreement: AGREE > MOVE T …dat …IA

It is important to note that, while in agreeing SE only number agreement with the IA is allowed, in dat-nom contexts there is full -agreement between T and the IA (see (24a)). That said, this contrast is not necessarily problematic for the analysis since, unlike SE, dat clitics do not cliticize on T (D'Alessandro 2008: 128). This would prevent any interference between T and the IA once the dat has moved above T. In fact, some authors maintain that the dat of psych-verbs does not land in Spec,T at any point of the derivation because it is not a subject vis-à-vis Icelandic (Gutiérrez-Bravo 2006, Fábregas et al. 2017).

In summary, I have proposed that SE and dat clitics may trigger defective intervention effects in Spanish if they are *in situ* when Agree takes place. At the same time, the different agreement outcomes (full or partial agreement) depend on the featural configuration of these clitics. This analysis tries to shed some light on the role of optionality in grammar and in syntactic theory.

#### **5 Conclusion**

This paper has examined variation in SE structures in Spanish with a special focus on non-agreeing SE. Empirically, I have shown that non-agreeing SE seems to be acceptable with definite DPs and that it optionally alternates with the agreeing version. This seems to be accurate at least for European Spanish and more investigation is needed to include American varieties. Theoretically, I have maintained that there is only one SE structure with two possible agreement outcomes. Evidence for this comes from the impossibility of having acc clitics in SE contexts.

The analysis I have put forward considers SE as a defective intervener that blocks agreement if Agree happens before SE cliticizes. On the contrary, agreement is possible if SE raises before Agree. I have shown that this analysis correctly predicts that, in Spanish, there is no person restriction regarding the IA and that this analysis is consistent with variation in dat-nom contexts.

Further research is required to assess the hypotheses presented throughout the paper about the mechanism of strong pronouns licensing and nom Case assignment in Spanish, the impact of cliticization, and the role of optionality within grammar.

#### **Abbreviations**


### **Acknowledgements**

Some of the ideas presented in this paper were developed during a research stay in the University of Utrecht. I am grateful to Roberta D'Alessandro for her guidance and to Ángel Gallego for his comments on a previous version of this paper. I also want to thank the audience of LSRL50 for their useful suggestions and two anonymous reviewers for their comments and remarks for further research. This research has been supported by the Spanish Government: (i) "Redes de variación microparamétricas en las lenguas románicas" (FFI2017-87140-C4-1-P, PIs: Á. Gallego & J. Mateu); (ii) Predoctoral Grant FPU2016; (iii) mobility grant for predoctoral stays FPU/EST2018.

#### **References**


## **Chapter 6**

## **Object control into temporal adjuncts: The case of Spanish clitics**

Katie VanDyne<sup>a</sup>

<sup>a</sup>University of Illinois at Urbana-Champaign

Obligatory control into temporal adjuncts has traditionally been observed to be limited to subject control, rather than object control (Landau 2013, 2015, Boeckx et al. 2010, among others). A challenge to this generalization arises in Spanish. In Spanish, in-situ full DP objects and postverbal object clitics conform to the established pattern of objects not being able to establish control into an adjunct. However, I present novel data showing that when an object clitic is preverbal, it can control the subject position of a non-finite temporal adjunct. Moreover, when a clitic can control into an adjunct, subject control is not ruled out; there can be ambiguity between the choice of controller. I show how this data can be accounted for following the two-tiered theory of control. I account for the optionality between a subject or preverbal clitic controller by distinguishing between two available positions for the clitic within vP.

### **1 Introduction**

A longstanding productive area of research in generative syntax has been accounting for the distribution and interpretation of the null subject of non-finite clauses, i.e. controlled clauses (see for example Williams 1992, Hornstein 1999, 2000, Boeckx et al. 2010, Gallego 2011, Landau 2001, 2013, 2015, among many others). In order to work towards this goal, many sets of cross-linguistic data have played an important role. While historically much of the discussion is based off of and accounts for control into complement clauses, control also occurs into adjunct clauses. Adjunct control is similar to complement control in that the subjects of matrix clauses can control the null subject of non-finite adjunct clauses.

Katie VanDyne. 2023. Object control into temporal adjuncts: The case of Spanish clitics. In Barbara E. Bullock, Cinzia Russi & Almeida Jacqueline Toribio (eds.), *A half century of Romance linguistics: Selected proceedings of the 50th Linguistic Symposium on Romance Languages*, 107–130. Berlin: Language Science Press. DOI: 10.5281/zenodo.7525102

#### Katie VanDyne

However, unlike complement control, where both matrix objects and subjects can be possible controllers (dependent upon the verb), object control is traditionally claimed to be disallowed in adjunct control structures (Landau 2013, 2015, Boeckx et al. 2010, among others). Observe this pattern in (1), where in (1c), only the matrix subject *John* can be coindexed with the adjunct subject, PRO.

(1) a. John promised Mary PRO to go to the party. (Subj. complement control) b. John persuaded Mary PRO to go to the party. (Obj. complement control) c. John saw Mary before PRO,∗ leaving the

party. (Adjunct control)

In Agree (Williams 1992, Landau 2001, Gallego 2011) and predication (Landau 2015, 2017) approaches to control, the lack of object control in (1c) is generally accounted for through of a lack of c-command between the object and the adjunct. Following the approach in Landau (2015), the object (*Mary*) does not c-command the adjunct, and thus predication cannot be established between the two items. I return to the details of this account in §2.

Adjunct control structures in Spanish would be expected to generally pattern the same, in that matrix objects cannot control into an adjunct. This is what is found in control contexts in Spanish, that only subjects can control, as illustrated in (2). The intended controller is in bold.

	- b. \*Yo I besé kissed [**a** dom **mi** my **novia**] girlfriend antes before de of PRO PRO ponerse become.inf.refl.3sg celosa. jealous.f

'I kissed my girlfriend before she got jealous.'

However, when the object is realized as a preverbal clitic, rather than a full DP, there are deviations from the expected control patterns. A previously undiscussed phenomenon, and one crucially different from the examples in (1) and (2), is that object clitics do appear to be able to control into adjuncts, as in (3).

(3) **La** her besé kissed después after de of PRO PRO ponerse become.inf.refl.3sg celosa. jealous.f 'I kissed her after she got jealous.'

The example in (3) thus raises questions regarding how to account for these patterns, given that previous approaches have accounted for explaining the lack of object control in examples like (1) and (2b). Furthermore, the presence of a preverbal clitic does not mandate that the clitic must be the controller. Both subject and object control appear to be available in these structures, exemplified by the subject control structure in (4), where the first-person null subject controls the adjunct subject position.

(4) *Pro pro* la her besé kissed después after de of PRO PRO ponerme becomeinf.refl.1sg celoso. jealous.m 'I kissed her after I got jealous.'

The goal of this paper is to discuss in detail novel data of object clitics controlling the subject position of Spanish non-finite adjuncts. Moreover, I will show how these data can be accounted for following Landau's (2015) two-tiered theory of control and will provide an account to explain how there is optionality between subject and object control in an otherwise identical structure. I begin §2 with a brief background on Landau's (2015) theory of control. §3 takes a closer look at the range of data and the type of structures where a clitic can control the subject position of a non-finite adjunct. In this section I also present why these structures should be analyzed under the category of obligatory control (OC), rather than non-obligatory control (NOC). In §4 I show how clitic control can be accounted for following Landau (2015). Additionally, in this section I also discuss the apparent optionality in the choice of controller, as well as how to account for the lack of in-situ object clitic control in Spanish. §5 concludes the paper.

### **2 Background: Landau's (2015) two-tiered theory of control**

Given the extensive history of the literature on control, it is beyond the scope of this paper to detail the full history and evolution of the various syntactic and semantic approaches to control. I therefore will limit the discussion to the key aspects of Landau's (2015) two-tiered theory of control (TTC), the theory which I will adopt for explaining adjunct control by object clitics.<sup>1</sup>

In the TTC, control structures are divided into two subgroups: predicative control and logophoric control. Since Landau considers cases of right-edge, OC adjunct control to be predicative control, I focus only on these structures and set aside the derivation of logophoric control for this paper.<sup>2</sup> Focusing first on the derivation of complement control structures, which is the focus of the account, Landau claims that predicative control is derived via predication and movement. First, a copy of PRO from Spec TP is moved to Spec FinP (here, Fin is a predicative head within the left periphery, following Rizzi 1997). This crucial movement consequently creates a λ-abstract, turning the FinP projection into a predicate. The closest c-commanding DP to PRO saturates the predicate and is interpreted as the controller. This instance of predication is what establishes the control relationship. Finally, the features of PRO are valued at PF by either agreeing directly with the DP controller or by an indirect Agree between the controller DP and Fin. The features on PRO in Spec FinP are then shared with the lower copy of PRO via feature sharing, following Pesetsky & Torrego (2007). The structure of complement control by a subject, from Landau (2015: 26), is shown in Figure 1 for (5).

(5) John managed to stay healthy. (Landau 2015: 26)

Under Landau's derivation of predicative control, object control occurs in complements with implicative verbs that are semantically causative (such as *make*, *compel*, *force*). In object control constructions, the object, positioned in the subject position of a small clause (here an RP, which is a predicative or relator phrase), intervenes between the matrix subject and PRO and is the closest ccommanding DP to PRO. Only then can the object, rather than the subject, con-

<sup>1</sup>While the focus of the present paper is not on the debate between theories of control, there is support that the clitic control data support a theory like Landau's that relies on c-command to establish control, rather than a theory of control as movement (for example Hornstein 1999, 2000 and Boeckx et al. 2010). Crucially, under a movement theory of control, a derivation of control by object clitics in Spanish would likely be treated the same as control by in-situ (full DP) objects, which cannot control the subject of a temporal adjunct. While it is beyond the scope of this paper to go into more details, this is the reason that I have adopted Landau's account.

<sup>2</sup>There are also instances when NOC can occur in a right-edge adjunct, in which case Landau (2017) analyzes them as instances of logophoric control (see also Green 2018, 2019). Left-edge adjuncts, which largely pattern with NOC structures (although there are some exceptions), Landau also treats as logophoric control. The clitic control examples I am focusing on all are right-edge adjuncts and display OC, as will be argued in §3, and should thus be treated as predicative control.

Figure 1: Subject complement control (Landau 2015: 26)

trol PRO. The structure of object control, from Landau (2015: 29) is given below in Figure 2 for (6) with the controlling, intervening object in bold.

(6) John forced Bill to stay home. (Landau 2015: 29)

While Landau (2015) focuses mainly on deriving complement control structures, there is a brief discussion of extending his analysis to cases of adjunct control. Landau, following from Williams (1992), claims that right-edge adjuncts function as predicates and display obligatory control, which in the TTC, are considered to be instances of predicative control. Regarding the technical details of predicative adjunct control, Green (2018) brings up questions regarding how the λ-operator is passed up the tree, since FinP is not the sister of any argument. He also questions how exactly predication is mediated in the adjunct. Like Green, I assume that these details can be worked out, and that the predication does obtain. I refer the reader to Landau (2021) where a detailed analysis of adjunct control is provided. An example of the general structure for adjunct control following Landau (2015, 2017) is provided in Figure 3 for (7).

(7) John left after eating dinner.

Figure 2: Object complement control (Landau 2015: 29)

### **3 Control by object clitics**

In §3.1 I first provide a brief overview of the distribution and analysis of object clitics in Spanish that I will adopt and lay out the structure I assume for them. Following, in §3.2 I discuss in more detail the contexts in which clitic control into adjuncts can obtain in Spanish, followed by a discussion of the type of control displayed in these structures in §3.3.

#### **3.1 Background on Spanish clitics**

In Spanish, accusative clitics can be located in different positions, depending on the structure of the sentence. With finite verbs, clitics are obligatorily located in a preverbal position (8b). However, when the verb immediately preceding the clitic is an infinitive or gerund form of the verb, the clitic can optionally remain in-situ (8c) or can be positioned before the inflected verb (8d). The position in between the finite and non-finite verb is, however, not available for the clitic (8e). These positions are summarized below.

Figure 3: Adjunct control

	- b. Juan Juan lo it comió ate.3sg *t* . 'Juan ate it.'
	- c. Juan Juan está is.3sg comiendo-lo. eating-it 'Juan is eating it.'
	- d. Juan Juan lo it está is.3sg comiendo eating *t* . 'Juan is eating it.'
	- e. \*Juan Juan está is.3sg lo it comiendo eating *t* . 'Juan is eating it.'

Regarding the structural position of these clitics, I assume the clitics are originally merged in the internal argument position before moving into their preverbal position. Following phase theory (Chomsky 2000, 2001) and phase-based

approaches to cliticization (Gallego 2016), I assume that the clitic is first moved through the edge of the vP phase, before moving to its final position. Making use of the notion of tucking-in following Richards (1997), I consider the object clitics to be merged in the argument position and then tucked into an inner specifier following Nevins (2011) and Kramer (2014) (however, see §4.3 for an update to this analysis). The basic structure of (8b) is illustrated in (9).

(9) [TP *pro* [TP lo comió [vP lo *pro v* [VP comió lo]]

#### **3.2 Clitic control data**

In this section I present more examples of clitic control and outline the range of structures and clitic types that allow for this type of adjunct control construction. For this paper, I am limiting the scope of the discussion to only right-edge (i.e. final) temporal adjuncts which are traditionally claimed to only allow subject (not object) control.<sup>3</sup> The example from (3) is repeated below in (10a), along with additional examples of clitics controlling the adjunct subject.

	- b. ¿Tu your novio boyfriend no didn't **te** you vio see.3sg antes before de of PRO PRO irte leave.inf.refl.2sg para for la the boda? wedding

'Your boyfriend didn't see you before you left for the wedding?'

c. Mi my papá dad **me** me llevó brought para to PRO PRO comenzar start.inf el the primer first día day de of escuela. school 'My dad brought me to start my first day of school.'


While discussed in detail in Faraci (1974), Chomsky (1980) and Jones (1991), a key difference between purpose and temporal adjuncts is that purpose adjuncts are argued to be attached in a lower position, as VP (internal) adjuncts (see also Green 2019 for a discussion of adjunct height). This position would make *Mary* in (i) and *a detective* in (ii) the closest c-commanding controller. This lower position can account for why an object, and not the subject, is the obligatory controller.

<sup>3</sup>Object and subject purpose adjuncts can display object control, where PRO is controlled by the goal or benefactive, as in (i).

As seen from these examples, third person clitics (10a), as well as second (10b), and first person clitics (10c) are all able to control into an adjunct. Note that while the controlled verb in both (10a) and (10b) is reflexive in order to show an overt realization of the features of the controller, this is not obligatory, as displayed in (10c) and later in (18), where the controlled verb is not reflexive. In these structures, PRO is the antecedent of the reflexive, and the reflexive pronoun thus indicates which argument is controlling PRO. While the data in (10) show that clitic control is possible, crucially, as was shown in (4), subject control is still possible into these Spanish temporal adjuncts. Observe the minimal pair in (11) where both subject and clitic control are possible.

	- I him saw.1sg before of PRO leave.inf.refl.3sg for Spain 'I saw him before he left for Spain.'

Thus, it appears that these clitic control structures involve some degree of optionality as to which argument, the subject or the object clitic, acts as the controller. I return to this optionality in §4.

Further, while the object clitic controls PRO in the structures in (10), it is important to note that this only appears to be possible when the object is a clitic, rather than a full DP, and crucially, only when the clitic is moved to a preverbal position. Observe how postverbal DPs (both full DP objects and object clitics) cannot control, shown in (12a) and (12b). Additionally, postverbal clitics in imperative structures are also unable to establish control (12c) as are preverbal clitics in clitic climbing structures (non clause-mate clitics) (12d).

	- b. \* Quiero want.1sg besar-la kiss.inf-her antes before de of PRO PRO ponerse become.inf.refl.3sg celosa. jealous.f 'I want to kiss her before she gets jealous.'

The full DP object, *mi novia*, in (12a) is unable to control the adjunct subject. Likewise, the example in (12b) shows that an accusative in-situ clitic also cannot be interpreted as the adjunct subject. The full DP object and postverbal clitic and examples appear parallel to the English example in (1c), in that a lower internal argument is unable to control. I return to discuss these ungrammatical examples, including the imperative structure in (12c) and (12d), where the clitic in a higher clause is unable to control, in §4.3. Note that in order for (12a) and (12b) to have the desired interpretation, both examples would require the use of the subjunctive in the adjunct verb, in which case the subject would be considered *pro*, since the verb would no longer be a non-finite form, shown in (13).

(13) Besé kissed a dom mi my novia girlfriend después after de of que that *pro pro* se se pusiera became.subj celosa. jealous.f 'I kissed my girlfriend after she got jealous.'

The conclusion from this data is summarized in (14).<sup>4</sup>

(14) Only moved, clause-mate, pre-verbal clitics are potential object controllers of the subject position of non-finite adjuncts in Spanish.

#### **3.3 Clitic control is OC not NOC**

As a final discussion before proceeding into the analysis of clitic control, it is important to establish what type of adjunct control these clitic structures display,

(i) Siempre always me 1sg.dat duele hurts la the cabeza head después after de of PRO/∗ PRO correr. run.inf 'My head always hurts after running.'

Because of the structural differences between these experiencer constructions and the examples in (10), I set aside these constructions. For a full discussion of control by dative experiencers see Landau (2009) as well as Landau (2021). These may fall under the category of NOC.

<sup>4</sup> It is important to mention another structure where a non-(nominative) subject, dative experiencers, can control into temporal adjuncts, as in (i).

obligatory control (OC) or non-obligatory control (NOC). While OC structures are controlled by a local antecedent, NOC structures are those where PRO is not syntactically bound by a local antecedent in the structure, but rather PRO refers to a long-distance antecedent, a discourse referent, or an arbitrary referent (Landau 2013). Thus, NOC is subject to fewer syntactic constraints on the controller. Three typical examples of NOC, arbitrary control, long distance control, and discourse control (all from Landau 2013), are given in (15).

(15) a. Potatoes are tastier [after PRO boiling them.]

```
(Arbitrary NOC)
```

In (15a) the subject of the matrix clause, potatoes, is not interpreted as controlling PRO, but instead PRO, not being controlled by any syntactic antecedent, receives an arbitrary interpretation. In (15b) PRO refers to a long-distance antecedent by which it is not bound. Likewise, (15c) is also not bound by an element in the structure but instead receives its interpretation from something salient in the discourse.

In this section, I address if it is possible to analyze constructions like (10) as instances of NOC, rather than OC. Classifying these structures as NOC would be a way to avoid the questions that arise regarding accounting for clitic control mentioned in §1. However, I show that this approach is not tenable. The clitic control structures do display the characteristics of OC.

Landau (2017) expands on his discussion of adjunct control from Landau (2015) and suggests that right-edge (final) adjuncts can be either OC or NOC. The central claim is that while OC right-edge adjuncts are derived via predication, NOC right-edge adjuncts are derived via logophoricity. The difference, Landau claims, between OC and NOC final adjuncts relies heavily on the presence (or absence) of a c-commanding controller. While OC PRO must be bound by a local antecedent and must be interpreted as a bound variable, NOC PRO may be a free variable and does not need to be bound by another element. It is also standardly claimed that NOC PRO is obligatorily [+human]. The summary of what Landau refers to as the OC/NOC signature is given in (16) and (17).

(16) The OC signature

In a control construction [...Xi ...[s PRO ...]...], where X controls the PRO subject of the clause S:

	- In a control construction [...[s PRO ...]...]:
	- a. The controller need not be a grammatical element or a co-dependent of S.
	- b. PRO need not be interpreted as a bound variable (i.e., it may be a free variable)
	- c. PRO is [+human]. (Landau 2013)

Thus, if one were to classify structures like (10) as displaying NOC, clitic control would not immediately be problematic for predication-based (or the earlier AGREE-based) theories of control. There are, however, examples of object clitic control structures that clearly do not fit into Landau's definition of the NOC signature.

Examining the first criteria, (16a), the clitic controller in examples like (10) is a co-dependent of S, satisfying that requirement of the OC signature. However, while in NOC the controller is not obligatorily a co-dependent but may be a co-dependent of S, being a co-dependent of S does not rule out (10) being NOC. Turning to the [+human] aspect of the NOC signature, observe (18) where a nonhuman object can control PRO.

	- b. Lo it cosechó harvested antes before de of PRO PRO florecer. flower.inf 'He/She harvested it before it flowered.'

<sup>5</sup>Landau (2013) defines a codependent as: "A 'dependent' of S is either an argument or an adjunct of S, thus (a) subsumes both complement OC, where the controller and S are co-arguments and adjunct OC."

According to Landau's NOC signature, NOC PRO should only be interpreted as human. However, in (18) an accusative, non-human clitic can control PRO, showing that these structures do not conform to the requirements of NOC. Moreover, if the structures in (10) were instances of NOC, rendering a c-command relationship unnecessary, it would not be immediately clear why postverbal clitics or non-clause mate preverbal clitics cannot control. If the dependency were not local, and c-command were not a requirement, why would only preverbal clitics control? Therefore, following Landau's definition of the OC/NOC signature, there is strong support for treating these structures as OC.

In sum, from the data and discussion in this section, I conclude that in addition to matrix subjects being able to control adjunct PRO, preverbal clitics (either 1st, 2nd, or 3rd person) are also able to control into an adjunct. Moreover, I have also shown that these structures are instances of obligatory control, rather than nonobligatory control. This is crucial to the discussions in the upcoming sections, where I account for the clitic control data under the assumption that these are cases of obligatory control.

#### **4 Clitic control in the TTC**

Because clitic control data like (10) are not discussed in previous theories of control, and in fact obligatory control by an object is not predicted to occur at all in temporal adjuncts, further explanation is needed to account for these structures. In §4.1, I discuss how clitic control examples can be analyzed under Landau's (2015) theory of control. In §4.2 I propose a way to account for the optionality between subject and object clitic control into adjuncts based on the position of the subject and clitic. Finally, in §4.3, I return to the ungrammatical clitic control examples mentioned in §3.2 and show how those can also be accounted for under the TTC.

#### **4.1 Accounting for clitic control**

Following the framework of the TTC presented in Landau (2015), the lack of object control in temporal adjuncts would likely be attributed to the fact that an internal argument in the matrix would not be able to c-command the adjunct, and thus no predication could be attained between PRO and the controller. While this would correctly rule out the ungrammatical examples involving full DP objects (in both English and Spanish) there still appears to be room in this theory

to also account for the examples where an object clitic can successfully control into an adjunct.<sup>6</sup> In Landau's theory, it is the final position of the object that is argued to block object control in English adjuncts. When no c-command relationship can be established between the controller and controlee under this theory, predication, and consequently Control, fail to occur. However, recall from the discussion in §3 that Spanish (preverbal) clitics are moved to a higher position. From this higher position in Spec vP, the clitic is able to c-command and saturate the predicate of the adjunct subject, which is what appears to allow the object control to occur. In this case, predication (and thus control) can be established between the controller (the clitic) and PRO. Moreover, on the account that the clitic tucks into an inner specifier of vP, there appears to be no other argument intervening between the clitic and PRO. Observe the structure of (10a) shown below in Figure 4 for (19).<sup>7</sup>

(19) La besé después de ponerse celosa.

The structure in Figure 4 can thus account for the phenomenon of clitics controlling. However, we are then left with a larger question regarding how both subject and clitic control are possible in an otherwise identical structure (as in (11)). While the structure and analysis in Figure 4 demonstrates how preverbal Spanish clitics are able to establish predication with PRO, this analysis consequently predicts that only the clitic could act as a controller of PRO, given that it intervenes between the matrix subject and PRO. This then raises questions regarding the structure of subject control into adjuncts and how it is possible for either the subject or object to be the controller. I now turn to discussing how to account for this possibility of both subject and clitic control.

(i) Hermione is looking after the birds. Hermione takes out the food. Ron tapped Hermione while feeding the owl.

<sup>6</sup> Janke & Bailey (2017) claim that not all temporal adjuncts are obligatorily controlled by the subject but that pragmatics can influence the choice of controller to instead be the matrix object. In an experimental study, they found that in a control structure with a final temporal adjunct, when strongly primed in the preceding discourse, almost half of speakers understood the matrix object to be the controller in an example like (i).

Because the both controller options are [+human] it is not immediately clear that this is an OC interpretation. However, if these can be analyzed as displaying OC with an object controlling, this may then serve as a counterexample to the traditional generalization that objects cannot control into adjuncts.

<sup>7</sup>Thanks to Idan Landau for his input on this structure.

Figure 4: Clitic control

#### **4.2 Co-existence of subject and clitic control into adjuncts**

Recall that subject control is also possible in the structures with preverbal clitics, as was shown in the minimal pair in (11), and repeated below in (20).

	- b. (Yo) I **lo** him vi saw.1sg antes before de of PRO PRO irse leave.inf.refl.3sg para for España. Spain 'I saw him before he left for Spain.'

The possibility of both subject and clitic control suggests that there is some amount of optionality in the controller of an obligatory control structure. This raises the questions of how this optionality arises and what determines which argument (the subject or the object clitic) is the controller. The idea of optionality in an obligatory control construction at first may seem unexpected given that in other examples of obligatory complement and adjunct control, there appears to

never be optionality between subject and object control in the same structure.<sup>8</sup> Observe (21), from Landau (2015), where in the presence of a matrix object in a predicative control structure, the object must control.

(21) a. John managed PRO to stay healthy. (Subject control) b. John forced Bill PRO∗/ to stay home. (Object control)

However, the adjunct control structures are quite different than those of complement control in (21). With a subject control verb like *manage* in (21a), the only available interpretation is with the subject *John* acting as the controller. *John*, the subject, is the only c-commanding DP. There is no competing or intervening DP, and thus the subject controls PRO. On the other hand, in (21b), the matrix object must be the controller of PRO. For predicative control, all object control structures (primarily those non-attitude verbs that are semantically causative) are always controlled by the object. As discussed in §2, in these cases the predicative head (FinP) applies to the small clause that contains the matrix object. The object in these cases is the closest, c-commanding DP which saturates the predicate (PRO) and will always control. Thus, because of the structure of complement control constructions, no optionality can arise between subject and object control.

I propose that the optionality in subject/clitic control arises from the position of the clitic. Until now, I have assumed for this analysis that the clitic moves to Spec vP, tucked in under the external argument, which is also in a specifier of vP. However, the two different semantic interpretations (subject control vs object clitic control), leads to the natural suggestion of there being two distinct derivations for these structures. In both derivations, the subject would move first, to the specifier of vP. Following, in the derivation resulting in clitic control, the clitic would tuck-in to an inner specifier of vP, as was claimed earlier. In this derivation, the clitic would be the closest c-commanding DP to PRO and would thus saturate the predicate in the adjunct and establish control. However, I argue that an alternate derivation is also possible which results in subject control. In this

(i) The pool was the perfect temperature after PRO/ being in the hot sun all day.

<sup>8</sup>There are some adjunct control structures that show optionality in that they can display either OC or NOC (discussed in detail in Green 2018, 2019). Observe (i), from Green (2018), where the referent of PRO could be the subject (OC), or an antecedent outside of the sentence (NOC).

However, given the discussion in §3.3, where I have claimed that the clitic control examples do behave as OC, these structures are different than those like (i) that show alternation between OC/NOC, as this alternation appears to be one between OC subject control and OC object control.

derivation, instead of the clitic tucking-in, it would move to an outer specifier of vP. Here the subject would instead be the closer DP to PRO and would thus serve as the controller. Following Richards 1997, both of these derivations (movement to an outer specifier or tucking into an inner specifier) in principle should be possible, and both respect notions of cyclicity as outlined in Chomsky (1995). This possibility of optionality in the derivation (i.e. clitic movement to outer specifier or tucking into an inner specifier) also conforms to the notion in Chomsky (2001: 34) that optionality is allowed if it leads to a different outcome, i.e. a difference in the interpretation (in this case subject vs object control).<sup>9</sup> Simplified versions of these two derivations are shown in Figure 5a and Figure 5b, using the examples from (20).<sup>10</sup>

If the existence of both subject and clitic control is made available from the possibility of two different derivations as I have suggested above, instances of optionality between a clitic and subject would be expected to be found in other structures, outside of the adjunct control examples. One such situation is observed with reflexive binding, where both the clitic and subject are able to serve as the antecedent to the reflexive pronoun, shown in (22).

	- b. Le him hablé spoke.1sg de of sí-mismo. himself 'I talked to him about himself.'

<sup>9</sup>An alternative explanation for the alternation between subject and clitic control was suggested by an anonymous reviewer, wherein if the clitic and subject are both in multiple specifiers, they would remain equidistant from PRO and neither would block the other. While this would be a simpler solution, this type of optionality, however, would not be expected under an analysis for control that is derived via predication, like the TTC. With predication, it is expected that only one, closest controller will saturate the predicate and establish control.

<sup>10</sup>For reasons of clarity, and in order to focus on the clitic movement, I have simplified the structures in Figure 5. Here, I adopt an account of Spanish subjects in which overt, preverbal subjects are found in a higher position than pro, along the lines of Cardinaletti & Roberts (1991) and Cardinaletti (1997). Specifically, I follow the account for Spanish subjects from Suñer (2003), who makes use of multiple specifier positions and proposes that Spanish overt subjects are in an upper specifier of TP, higher than null subjects. This structure would be as is shown in (i).

<sup>(</sup>i) [TP yo [TP pro [ V...]]]

Thus, in these cases, if the subject is pronounced, it would be moved to a higher specifier of TP, but the lower deleted copy/trace in the Spec vP position is what saturates the predicate and establishes control. With this analysis of subjects taken into account, the derivation results in the desired word order.

Figure 5: Clitic vs subject control derivations

c. \* \*Hablé spoke.1sg a to Juan Juan de of sí-mismo. himself 'I talked to Juan about himself.'

In the examples in (22), a local c-commanding antecedent is needed to bind reflexives. In (22a), the first-person reflexive anaphor *mí mismo* is bound by the first-person (null) subject. In (22b), the third-person reflexive *sí mismo* is bound by the third-person clitic, *le*. Finally, in (22c), the third person reflexive anaphor is not licensed due to the lack of an appropriate binder that has matching features. The full DP *Juan* does not c-command the anaphor, leading to the ungrammaticality. I thus conclude from (22) that there is a choice between the two potential binders, either the clitic or the subject. This suggests that, similar to the adjunct control cases, with reflexive binding, there are two derivations possible. Either the clitic tucks in, and is the closest DP for control/binding, or the clitic moves to an outer specifier, and the subject then is the closest DP. Therefore, the optionality between the two derivations does not appear to be limited to just the adjunct control structures.

Finally, the proposal that the clitic and the subject are both possible controllers, depending on the derivation, should not cause a problem under the framework set up in the TTC. There seems to be no theoretical motivation to exclude the possibility of two DPs being potential controllers.<sup>11</sup> I thus conclude that while it may not be seen in other cases of obligatory control, with the appropriate structure, there is the possibility of controller optionality in adjunct control. This still

(i) La her abracé hugged después after de of PRO PRO reunirnos meet.inf.1pl por for primera first vez. time 'I hugged her after we met for the first time.'

However, both split and partial control, following Landau (2015), involve logophoric control rather than predicative control (since, per Landau, predication cannot be partial). These do appear different than the earlier examples of clitic control, crucially because a parallel structure, but with an in-situ direct object, is also grammatical.

(ii) Abracé hugged a dom mi my hermana sister después after de of PRO PRO reunirnos meet.inf.1pl por for primera first vez. time 'I hugged my sister after we met for the first time.'

Due to the differences in the structure of predicative and logophoric control though, crucially that logophoric control involves variable binding which gives the reference of PRO to either the addressee or author (or in the case of partial control, both), this is not a direct parallel.

<sup>11</sup>A potential parallel would be in cases with split control, where the reference of PRO includes both the subject and the object, as well as partial control, where the reference of PRO is only partially included in the controller. Observe an example of partial control in Spanish, below.

conforms to the TTC as there is nothing that explicitly rules out the possibility for optionality.

#### **4.3 Explaining the ungrammatical clitic control examples**

Lastly, in addition to explaining the grammatical clitic control examples in (10), the TTC can also successfully account for the ungrammatical object control examples, as shown in (12), and repeated below in (23).

	- b. \* Quiero Want.1sg besar-la kiss.inf-her antes before de of PRO PRO ponerse become.inf.refl.3sg celosa. jealous.f 'I want to kiss her before she gets jealous.'
	- c. Bésa-la kiss-her antes before de of PRO PRO irse. leave.inf.refl.3sg 'Kiss her before she leaves.'
	- d. \* Lo Him logré managed conocer meet.inf antes before de of PRO PRO graduarse. graduate.inf.refl.3sg 'I managed to meet him before he graduated.'

The examples in (23a) and (23b) are accounted for straightforwardly in the TTC. As is suggested for the ungrammatical object control examples in English in (1), in (23a) and (23b) the in-situ object or clitic are likewise not in a position to c-command PRO in the adjunct. Thus, under this theory, given that predication is the crucial step in establishing control and an in-situ object cannot c-command the adjunct, these arguments are unable to establish predication and thus are unable to control.<sup>12</sup> Once again, the positioning of the object clitic in Spec vP is crucial in allowing control to occur into an adjunct. I acknowledge that there

<sup>12</sup>There are various accounts proposing that objects with differential object marking (DOM) in Spanish are moved to a higher position, such as Spec P (between vP and VP) in López (2009), Spec vP in Torrego (1998), or even higher to Spec DatP in Rodríguez-Mondoñedo (2007). While it is outside of the scope of this paper to develop a specific account of DOM objects, due to the lack of full-DP objects, including DOM objects, controlling into adjuncts, I assume an account of DOM objects in which the adjunct is adjoined higher than whatever the final position of the DOM object ultimately is.

may be further questions regarding the topic of clitic positions, particularly for the postverbal clitics (23b), as an anonymous reviewer points out. Given the considerable literature on Spanish clitics, and the lack of one established or standard account, in this paper I am assuming that the final position of postverbal clitics is low, and not in a position where it is in competition to be controller. However, note that even if a postverbal clitic is analyzed as having a final position in TP, this will be too high a position to control from, given that the subject (in Spec vP) is always expected to intervene. The topic is one that deserves more attention, and I set a more complex discussion of the issue aside as a question for future research.

Regarding (23c), approaches like Rivero & Terzi (1995) claim the obligatory postverbal position of the clitic in imperative structures arises from verb movement to C, which occurs around the raised clitic. If this account is correct, the ungrammaticality of (23c) is unexpected, since the clitic is also found in TP in these structures. I follow Terzi (1999) who says that in imperative structures, the clitic adjoins to T and the verb left incorporates into the clitic. It is then the verb and clitic complex that moves up to C, accounting for the clitic's postverbal position. In this case, the clitic would then be higher in T/C, and the subject would be the closest controller, resulting in control only by the subject.

Finally, turning to (23d), where there is a complement control structure in the matrix clause and the clitic climbs to be positioned before the inflected verb, the clitic is also unable to control into an adjunct. While I assume this arises due to the complications with the additional complement subject control clause in the matrix (*Lo logré PRO conocer antes de PRO graduarse)*, I set aside a full account of this structure for future research.

Observe that from this data it can be concluded that while the object must be in a higher position (i.e. a preverbal clitic, rather than an in-situ full DP object or an in-situ object clitic), it also cannot be too high (as seen in the imperative example). That is, a specifier of vP is the crucial position. This can also be seen in examples with objects moved to the left periphery, where they are too high to control. For example, in interrogative structures like (24), the object is moved to Spec CP, a position from which it could c-command PRO in the adjunct. However, only the subject is a possible controller.

(24) ¿A dom quién who viste see.2sg t t antes before de of PRO PRO ir[te/\*se] leave.inf.[refl.2sg/refl.3sg] para for España? Spain? 'Who did you see before (your) leaving for Spain?'

When the object is moved to Spec CP, there is no derivation in which the object is in a position to control. The subject must be the closest c-commanding DP and is the only possible controller, as it would always intervene between the higher object and adjunct PRO. This reinforces the obligatory nature of the clitic being positioned in Spec vP as a precondition of being a potential controller of adjunct PRO.

### **5 Conclusion**

The goal of this paper was to contribute the novel observation that object clitics can control into Spanish adjuncts to the empirical discussion of adjunct control. Unlike what has been generally claimed, that only subjects can control into temporal adjuncts, preverbal, clause-mate object clitics in Spanish are potential controllers. Interestingly, the structures that allow clitic control alternatively allow for subject control as well. Thus, the dichotomy of when subject and object control occur appears to not be as strict as in complement clauses. I have argued that this optionality in the choice of controller is due to the position of the clitic, which has the option of either tucking into Spec vP or moving to an outer specifier of vP. When the clitic tucks in, it is the closest c-commanding DP to adjunct PRO and establishes control, but when it is moved to an outer specifier, the subject is closest to the adjunct and is the argument that establishes control.

### **Acknowledgements**

Thanks to two anonymous reviewers as well as audiences from LSRL 50 and Going Romance 34 for helpful feedback. Special thanks to Jonathan MacDonald, Aida Talić, Jefferey Green, Idan Landau, Yvette Bandín, Almike Vázquez-Lozares, Dulcinea Muñoz Gómez, Fabiola Fernandez Doig for data and discussion. All errors are my own.

### **References**


Cardinaletti, Anna & Ian Roberts. 1991. Clause structure and X-second. Ms. Chomsky, Noam. 1980. On binding. *Linguistic Inquiry* 11(1). 1–46.

Chomsky, Noam. 1995. *The minimalist program*. Cambridge, MA: MIT Press.


Hornstein, Norbert. 1999. Movement and control. *Linguistic Inquiry* 30(1). 69–96.

Hornstein, Norbert. 2000. *Move! A minimalist theory of construal*. London: Wiley-Blackwell.


Landau, Idan. 2001. *Elements of control: Structure and meaning in infinitival constructions* (Studies in Natural Language and Linguistic Theory 51). Dordrecht: Springer.


tic Inquiry Monographs), 93–102. Cambridge, MA: MIT Working Papers in Linguistics.


## **Chapter 7**

## **Overt vs. null subjects in infinitival constructions in Colombian Spanish**

Kryzzya Gómez<sup>a</sup> , Maia Duguine<sup>b</sup> & Hamida Demirdache<sup>a</sup> <sup>a</sup>LLING-Université de Nantes /CNRS <sup>b</sup> IKER-CNRS

Standard approaches predict complementary distribution between referentially free (overt/null) subjects and referentially dependent PRO-type null subjects. This generalization is challenged by Colombian Spanish non-finite adjuncts, which allow both overt subjects and referentially free null subjects. We uncover an intricate pattern of distribution and interpretation along two criteria: Obligatory *vs*. Non-Obligatory Control (OC *vs.* NOC) and whether the controllee is silent or an obligatory-controlled overt pronoun (covert *vs*. so-called "overt PRO"). The distribution of sloppy readings with null subjects provides arguments for analyzing NOC as DP-ellipsis. We also show that both covert and overt PRO display the canonical diagnostics of OC except in one context. While they both only allow bound variable construals under ellipsis, overt PRO also allows co-reference when its controller is associated with focus. This paradox follows on the assumption that while both null and overt anaphors must be syntactically bound, only null anaphors are necessarily semantically bound.

#### **1 Introduction**

According to the standard theory, overt nominative subjects and referentially null subjects of the *pro*-type are licensed by finite INFL/T. As such, they are excluded from non-finite clauses, and thus not expected to alternate with PRO in this position (Williams 1980, Chomsky 1981, Rizzi 1986, Lasnik & Uriagereka 1988, Miller 2002). This set of standard assumptions is stated in (1), adapted from Szabolcsi (2009). See also Rigau (1995), Mensching (2000), Barbosa (2019), Livitz (2011), Corbalán (2018), a.o. and references therein for extensive discussion.

Kryzzya Gómez, Maia Duguine & Hamida Demirdache. 2023. Overt vs. null subjects in infinitival constructions in Colombian Spanish. In Barbara E. Bullock, Cinzia Russi & Almeida Jacqueline Toribio (eds.), *A half century of Romance linguistics: Selected proceedings of the 50th Linguistic Symposium on Romance Languages*, 131–156. Berlin: Language Science Press. DOI: 10.5281/zenodo.7525104

	- (i) Overt (nominative) subjects.
	- (ii) Referentially free null subjects (*pro*).
	- (iii) Overt controlled subjects.<sup>1</sup>

However, as has been reported in the literature, the predictions of the standard theory in (1) are empirically falsified. In particular, overt subjects alternating with controlled null (PRO-type) subjects have been shown to be allowed in many languages. The pioneering work by Piera (1987) and Lipski (1994) for instance shows that complement and adjunct infinitives allow overt subjects in different varieties of Spanish (2)–(3):<sup>2</sup>


Even more strikingly, although it is theoretically impossible for an infinitival subject that exhibits all the diagnostics of Obligatory Control (OC) to be overt, overt obligatory-controlled pronominal subjects are attested cross-linguistically. Following Livitz (2011), we henceforth refer to the latter as "overt PRO" (see also Szabolcsi 2009).

The novel contribution of this paper is twofold. First, it extends the empirical domain of overt PRO discussed in the literature solely in the context of complement clauses to adjunct clauses. Second, it applies a battery of diagnostic tests to establish whether overt pronominals in infinitival subject positions instantiate (or not) OC (in line with Duguine 2013).

The focus of this paper is on non-finite adjunct clauses which allow overt preverbal subjects. We show that overt and null subjects in infinitival adjunct clauses

<sup>1</sup>We borrow this terminology from Szabolcsi (2009) and Livitz (2011) respectively.

<sup>2</sup> For discussion of this pattern in Spanish, see also Rigau (1995), Torrego (1998), Mensching (2000), Zagona (2002), Perez Tattam (2007), Schulte (2007), Herbeck (2011), Corbalán (2018), González (2020). See also Szabolcsi (2009) on Hungarian, Italian, European Spanish, Portuguese, Romanian and Modern Hebrew, Sundaresan & McFadden (2009) on Tamil, Borer (1989) on Korean and Duguine (2013) on Basque.

in (Andean) Colombian Spanish (henceforth CS) exemplify three systematic patterns of exception to the standard generalizations in (1).<sup>3</sup>

The first pattern can be observed in infinitives introduced by the temporalcausal preposition *al*. It allows both overt DPs (pronouns or R-expressions) and null subjects of the PRO-type, which as we shall establish in §2.1, meet all the diagnostics of OC.<sup>4</sup> This pattern is illustrated in (4). Overt subjects (also) allowed in *al*-infinitives violate the ban on overt subjects in (1(i)).

(4) Juan Juan sería be.cond feliz happy [al in.the José / José él/ / he.nom PRO/∗ dejar leave.inf la the casa]. house 'Juan would be happy on leaving the house.'

The second pattern is instantiated in infinitive adjunct clauses headed by the preposition *sin* 'without'. The latter allow for both overt subjects (pronouns or R-expressions) and *pro*-type subjects, as illustrated below.<sup>5</sup>

(5) María María dejó stopped de of trabajar work.inf [sin without Rosa / Rosa ella/ / she.nom pro/ decir say.inf nada]. nothing 'María stopped working without (Rosa/her) saying anything.'

The reference of the null infinitival subject in (5) is free and (5), as such, violates the ban on referentially free null subjects (1(ii)), in addition to the ban on overt subjects (1(i)). As discussed in §2.2, we follow Hornstein (1999) in assuming null subjects in Non-Obligatory Control (NOC) constructions to be *pro* rather than PRO. Importantly, however, we provide an argument from the distribution of sloppy readings for generalizing Duguine's (2013, 2014) DP-ellipsis analysis of *pro* to NOC contexts. The last pattern is that of infinitive complements selected by the preposition *para* 'for', illustrated below:

<sup>3</sup>CS infinitival adjuncts allow overt subjects in preverbal position, a pattern commonly found across Caribbean varieties of Spanish (Suñer 1986, Dauphinais Civitello & Ortiz López 2016, González 2020). In Contemporary European varieties, overt subjects are typically restricted to the postverbal position (RAE-ASALE 2009). In Old Spanish, however, they were accepted in preverbal position (Mensching 2000, Corbalán 2018), while in other colloquial varieties, there appears to be no preference for the pre/postverbal position (Mensching 2000, Gallego 2010, Herbeck 2011). Scholars moreover agree that the overt subject appearing in nonfinite contexts in Spanish bears nominative case regardless of its pre/postverbal position (Piera 1987, Schütze 1997, Mensching 2000).

<sup>4</sup> See Rico (2016) on the temporal-causal properties of the complementizer *al*.

<sup>5</sup>Here we use PRO and *pro* as descriptive terms to conventionally refer to referentially dependent vs. free null subjects.

(6) Juan Juan se cl.3 fue left [para for \*María / María él/∗/ he.nom PRO/∗ estar be.inf feliz]. happy 'Juan left in order to be happy.'

Besides the expected PRO-like null subject, *para*-infinitives also allow overt subjects. But unlike in the two previous cases, the reference of the overt subject is not free – that is, only pronominal subjects anaphorically dependent on a local c-commanding antecedent are allowed to be overt. *Para*-infinitives thus violate the ban on overt controlled subjects (1(iii)), and as a consequence, the ban on overt subjects (1(i)), too. Following Livitz (2011), we call the overt controlled subject pronoun in (6) "overt PRO". Although overt PRO in complement clauses has drawn pointed attention in the literature (Piera 1987, Mensching 2000, Szabolcsi 2009, Livitz 2011, Livitz 2014), this is to our knowledge the first discussion of overt PRO in adjunct clauses (though see Duguine 2013 for a first approach in Basque).

The present paper explores the distribution and interpretation of subjects across these three patterns of infinitival adjunct clauses. We provide novel arguments showing that null and overt PRO do not pattern alike according to the standard tests for pronominal interpretation, namely, the ellipsis and association-withfocus tests. As is well-known, while the relation holding between a pronoun and its antecedent comes out as either Bound Variable Anaphora (BVA) or coreference under these tests, the relation between null PRO and its antecedent/controller comes out as BVA exclusively, as illustrated by the paradigm in (7) from Fodor (1975: 133–134).<sup>6</sup> The contrast in the interpretation of (7a) with a PRO subject in the gerund and (7b) with a possessive pronominal subject instead is fleshed out in (8)–(9) adapted from Fodor (1975: 135). Note that gerundive complements are OC constructions (Hornstein 1999).

	- b. Only Churchill remembers [his giving the speech about blood, toil, tears, and sweat].
	- a. and no other ( x (x remembers x giving the speech)). *BVA*

<sup>6</sup>Here we present this contrast as it is observed in the association-with-focus test. §4 presents the same contrast in the VP-ellipsis test.

	- a. and no other ( x(x remembers x giving the speech)). *BVA*
	- b. and no other ( x(x remembers Churchill giving the speech))

*Coreference* (his=Churchill)<sup>7</sup>

Example (8a) can only be understood to mean that no individual other than Churchill remembers himself giving the speech. This is captured by assuming that the VP property attributed to Churchill (and asserted to hold of no other individual) is that of remembering *oneself* giving the speech, where PRO is interpreted at LF as a variable bound by its antecedent (via -abstraction), that is, as semantically bound (cf. Heim & Kratzer 1998, Büring 2005).

Now, (7b) with the possessive pronoun allows the very same BV reading as in (9a), but can also be construed as in (9b). Here, the VP property attributed to Churchill (and asserted to hold of no other individual) is that of remembering *Churchill* giving the speech. This construal arises because the overt pronoun can be understood as coreference with the NP Churchill rather than being semantically bound. That the reading in (9b) is unavailable with PRO is taken to show that the relation that holds between (OC) PRO and its antecedent can only be BVA, not coreference anaphora.

Importantly, however, as we shall see in §2.3, overt PRO, unlike null PRO, patterns like a standard pronoun on the association-with-focus test, in allowing for both coreferential and BV construals, but just like null PRO on the ellipsis test, in allowing solely BV construals. There is therefore an unexpected and seemingly paradoxical distribution of interpretations. We show that these facts follow straightforwardly from the Anaphor Generalizations that we state in (10):

	- (i) Both null and overt anaphors need to be syntactically bound.
	- (ii) Overt anaphors can be semantically bound, null anaphors *must* be semantically bound.

This paper is organized as follows. §2 establishes the properties of both overt and null subjects in light of the tests distinguishing OC from NOC across the three types of adjunct infinitival clauses introduced above. §3 provides an analysis of NOC, as instantiated by the *sin*-pattern. Taking as point of departure Hornstein's (1999) characterization of NOC null subjects as *pro*, we extend Duguine's (2013, 2014) DP-ellipsis analysis of *pro* to NOC contexts, providing a compelling argument from the distribution of sloppy readings of null subjects. §4 discusses

<sup>7</sup>The notation here indicates that 'his' and 'Churchill' have the same referent.

what we call the Overt vs. Covert PRO paradox and develops an account in terms of the Anaphor Generalization in (10). §5 concludes the paper.

#### **2 Diagnostics for (Non) Obligatory Control**

This section seeks to establish a more fine-grained description of the interpretive properties of the null subject across our three classes of adjunct clauses. A battery of tests has been put forth in the literature to diagnose whether control is obligatory or not (Williams 1980, Hornstein 1999, Landau 2000, 2013, Baltin et al. 2015). OC and NOC can be distinguished as follows. Obligatorily controlled PRO requires a local and c-commanding antecedent for the null subject and, moreover, only allows BV readings under the two tests for pronominal interpretation (ellipsis and association-with-focus). In contrast, the null subject of NOC constructions does not require an antecedent (and if there is one, it need not be local or c-commanding), and allows both BV and coreferential construals under the ellipsis and association-with-focus tests.

Crucially, we apply these tests not only to the null subject of our three nonfinite adjunct clauses but also to the overt pronominal subject appearing in *para*infinitives which, recall from (6), displays restrictions on its interpretation similar to that of (null) PRO. The picture that emerges is quite an intricate one, with three levels of variation: (i) adjuncts differ from one another with respect to whether they enforce OC or not (cf. also Landau 2013); (ii) while besides null PRO, certain OC adjuncts allow overt subjects (pronouns or R-expressions) (e.g. *al*-infinitives), others only allow overt PRO (e.g. *para*-infinitives); and (iii) overt and null PRO yield conflicting results with respect to the tests for pronominal interpretation (and as such with respect to the diagnostics for OC vs. NOC) (e.g. *para*-infinitives).

#### **2.1** *Al***-infinitives**

Example (4) above shows clearly that while the overt subject of an *al*-infinitive is referentially free, its null subject is referentially dependent. The diagnostic tests teasing apart OC vs. NOC, applied in (11) to (14), show that *al*-infinitives with silent subjects involve OC.<sup>8</sup>

<sup>8</sup>The judgments for sloppy/strict readings in Colombian Spanish reported in this paper are taken from an experimental protocol carried out with 36 native speakers and using a Truth Value Judgment task (Gómez In progress). This protocol showed that both *al*-infinitives, and *para*infinitives in §2.3, were systematically rejected on a strict reading, but accepted on a sloppy reading – in contrast to *sin*-infinitives (§2.2, Footnote 10).

7 Overt vs. null subjects in infinitival constructions in Colombian Spanish

	- a. ✓Sloppy reading *(BVA)* María will be happy when María leaves the house.
	- b. ✗Strict reading *(Coreference)* María will be happy when Juan leaves the house.
	- a. ✓*BVA*

No, Karla also fell when she herself was taking the train. Karla ( x(x also fell when x was taking the train))

b. ✗*Coreference*

No, Karla also fell when Léa was taking the train. Karla ( x(x also fell when she was taking the train)) (she=Léa)

Example (11) shows that the null infinitival subject needs to be c-commanded by its antecedent, and (12) shows that long-distance antecedents are not allowed. The ellipsis test in (13) shows that the null subject allows a sloppy (that is, BV) reading, but crucially not a strict (coreferential) reading. The association-withfocus test in (14) confirms that it only allows a BV construal.<sup>9</sup> In sum, *al*-infinitives with null subjects display all the diagnostic properties of OC PRO.

<sup>9</sup> Spanish doesn't admit VP-ellipsis, but allows ellipsis of larger structures (Dagnac 2010, Saab 2010).

#### **2.2** *Sin***-infinitives**

Whereas *sin* and *al*-infinitives pattern alike in allowing overt subjects (be it Rexpressions or pronouns (5)), *sin*-infinitives with null subjects differ radically from *al*-infinitives in allowing null subjects that need not be controlled, leading us to characterize the latter as *pro*. Applying the diagnostics for OC vs. NOC in (15) to (18) allows us to confirm this characterization.


'María stopped working without her saying anything and Rosa did too.'

	- a. ✓*BVA* No, Daniela also stopped working without her own signed authorization. Daniela (λx(x stopped working without x's signed authorization)).

b. ✓*Coreference*

No, Daniela also stopped working without María's signed authorization. (In a context where María was the only person that could sign the authorization). Daniela (λy(y stopped working without her authorization))

(her=María).

*Sin*-infinitives exhibit all the properties of NOC. (15) shows that their null subject need not be c-commanded by its antecedent, and (16) that it need not be locally bound since long-distance antecedents are permitted. The ellipsis and association-with-focus tests in (17) and (18) show that their null subject allows for both BV and coreferential readings. These findings confirm that these silent subjects are of the *pro*-type, not of the (OC) PRO type.<sup>10</sup>

#### **2.3** *Para***-infinitives**

Just as in *al*-infinitives, the null subject in *para*-infinitives is referentially dependent, which we took to suggest it should also be PRO (see (4)/(6)).<sup>11</sup> Unlike in *al*-infinitives, however, and as indicated in example (6), the overt subject in these constructions is also referentially dependent, leading us to characterize it as overt PRO. We apply below the diagnostic tests for OC vs. NOC to both overt PRO and covert PRO:

	- a. ✓Sloppy reading *(BVA)* María left in order for María to be happy.

<sup>10</sup>The experiment (Gómez In progress) (see Footnote 8) showed that in contrast to both *para*infinitives and *al*-infinitives, strict readings in *sin*-infinitives, just like sloppy readings, were accepted systematically by speakers.

<sup>11</sup>See Footnote 8.

b. ✗Strict reading *(Coreference)* María left in order for Juan to be happy.

Overt and covert controlled subjects pattern exactly the same under these three tests. They need to be c-commanded by their antecedent (19), locally bound (20), and only allow BV readings under ellipsis (21). In sum, both types of subjects uniformly exhibit OC properties. These results confirm that the null subject is OC PRO, and that the overt subject is its overtly realized counterpart "Overt PRO" (Piera 1987, Mensching 2000, Livitz 2011, Livitz 2014). Crucially, however, the association-with-focus test in (22)/(23) yields dissonant results:

	- a. ✓*BVA* No, Daniela also cheated in order for herself to win. Daniela ( y (y also cheated in order for y to win)).
	- b. ✗*Coreference*

No, Daniela also cheated in order for María to win. Daniela ( y (y also cheated in order for her to win)) (her= María)

The statement in (22) with a null (by hypothesis, PRO) subject can only be denied in one way – that is, on its BVA construal, as expected. This result converges with those from the three previous tests in signaling OC. In contrast, the statement in (23) with an overt (by hypothesis, PRO) subject can be denied in either of two ways – that is, on either its BVA or coreferential construal.

	- a. ✓*BVA*

No, Daniela also cheated in order for herself to win. Daniela ( y (y also cheated in order for y to win))

b. ✓*Coreference*

No, Daniela also cheated in order for María to win. Daniela ( y (y also cheated in order for her to win)) (her= María)

The availability of both readings here for overt PRO is surprising, given that it is characteristic of NOC – and more generally of pronouns, see discussion of Fodor's examples (7)–(9) – and that the results of all the previous tests converge on an OC diagnostic.

#### **2.4 Interim summary**

The results obtained in this section are summarized in Table 1.


Table 1: Null vs. overt subjects in CS infinitival adjuncts.

A complex picture emerges from these findings. First, nonfinite adjuncts divide up into three classes. The first class, instantiated by *al*-infinitives, displays OC properties with null subjects while allowing for overt subjects. The second class, instantiated by *sin*-infinitives, displays the properties of NOC, systematically allowing for referentially free subjects – be they silent or lexical. And the third class, instantiated by *para*-infinitives, exhibits the properties of OC fully with null PRO, but not overt PRO. Second, overt subjects do not display homogeneous behavior across OC adjuncts. In particular, while *al*-infinitives allow either overt pronouns or R-expressions in subject position (alongside null PRO), *para*-infinitives only allow overt PRO (alongside null PRO). Finally, regarding the results of the OC/NOC tests in *para*-infinitives, a surprising contrast has emerged. On the one hand, null PRO displays all the characteristic properties of OC since it must be locally c-commanded by its antecedent, and it only yields BV readings under the ellipsis and association-with-focus tests. On the other hand, overt PRO displays most, but not all the characteristics of OC. It must be locally c-commanded by its antecedent, and it only yields BV readings under the ellipsis test. Crucially, however, it yields both coreference and BV readings under the association-with-focus test, patterning this time like a non-obligatorily controlled subject. These unexpected findings are recapitulated in (24):

	- (i) Both null and overt PRO only allow BV interpretations under the ellipsis test.
	- (ii) Overt PRO, unlike null PRO, also allows coreferential interpretations under the association-with-focus test.

Why do null PRO and overt PRO pattern differently (yield conflicting results) with respect to the two standard tests for pronominal interpretation? In particular, why does overt PRO, unlike null PRO, also allow co-reference interpretations, and why so only under the association-with-focus test, but not under the ellipsis test?

We address this paradox in §4 below. First, however, we turn to the analysis of NOC as instantiated in *sin*-infinitives.

### **3 Non Obligatory Control in** *sin***-infinitives**

*Sin*-infinitives exhibit a uniform NOC pattern. Assuming with Hornstein (1999) that a NOC null subject is not PRO, but *pro*, we develop here an analysis that not only explains why overt DPs and *pro* subjects alternate in these constructions, but also correctly predicts that they yield sloppy readings otherwise unavailable with (overt) pronouns. We put forth an analysis of NOC in terms of DP-ellipsis, in line with ellipsis approaches to null arguments in *pro*-drop languages (cf. a.o. Oku 1998, Saito 2007, Duguine 2013, 2014, Takahashi 2014). *Sin*-infinitives are thus constructions in which DPs (R-expressions or pronouns) surface in subject position and, under the right conditions, can also undergo ellipsis.

#### **3.1 A DP-ellipsis analysis of NOC subjects**

This section shows that the *pro*-like null subjects of *sin*-infinitives can behave as complex R-expressions, displaying non-pronominal readings. Consequently, they are better analyzed as elided DPs than as null pronouns. The crucial piece of data is given below. Consider a conversation like (25) where the subject of the *sin*-infinitive in (25b) is null:

(25) a. María María dejó stopped de of trabajar work.inf [sin without [su poss jefa] boss decirle say.inf.3sg nada]. nothing 'María stopped working without her boss saying anything to her.'

b. Y and Ana Ana también also dejó stopped de of trabajar work.inf [sin without [Ø] decirle say.inf.3sg nada]. nothing 'And Ana also stopped working without saying anything to her.'

The sentence in (25b) is ambiguous. The null subject can refer either to María's boss or to Ana's boss and the sentence can thus be interpreted as in (26a) or (26b) respectively:

	- a. ✓Strict-like reading And Ana also stopped working without María's boss saying anything to her.
	- b. ✓Sloppy-like reading And Ana also stopped working without her own boss saying anything to her.

The strict reading in (26a) can be explained if the null subject in (25b) is a covert pronominal expression, corefering with the DP *su jefa* 'her boss' introduced in the previous discourse in (26). However, since Ana's boss has not been introduced in the prior discourse context, a pronominal expression can logically not give rise to the sloppy interpretation in (26b). This is confirmed by (27): substituting an overt pronoun for the silent subject in (25b) can give rise to the strict interpretation in (26a), but not to the sloppy one in (26b):

	- b. Y and Ana Ana también also dejó stopped de of trabajar work.inf [sin without ella she.nom decirle say.inf.3sg nada]. nothing

'And Ana also stopped working without her saying nothing to her.'

The sloppy reading for the sentence in (25b) can however be accounted for if we postulate that, rather than a pronoun, the null subject is the covert complex R-expression *su jefa* 'her boss'. This analysis would make it possible for the possessive pronoun *su* inside the (covert) R-expression to be bound by the higher

subject *Ana*, giving rise to the targeted sloppy reading in (26b). The ambiguity of (28), where the null subject has been overtly spelled out as the DP *su jefa*, and which allow both the sloppy reading in (26b) and the strict reading in (26a), corroborates our analysis:

	- b. Y and Ana Ana también also dejó stopped de of trabajar work.inf [sin without [su poss jefa] boss decirle say.inf.3sg nada]. nothing 'And Ana also stopped working without her boss saying nothing to her.'

That (28b) with an overt R-expression in subject position and (25b) with a silent subject yielding the same sloppy reading suggests that, rather than a null pronominal, the null subject of (25b) is a syntactically null complex R-expression embedding the pronoun *su*. Interestingly, these data parallel some data pointed out by Duguine (2013, 2014) in the realm of pro-drop in finite clauses in Spanish. Duguine shows that (29b), with a null subject in a finite context, allows the two interpretations in (30):

(29) a. María María cree believes [que that [su poss jefa] boss le cl.3sg.dat exigirá require.fut.3sg mucho much trabajo]. work (adapted from Duguine 2014:520)

'María believes that her boss will require a lot of work from her.'

b. Y and Ana Ana espera hopes [que that [Ø] le cl.3sg.dat dejará leave.fut.3sg los the fines ends de of semana week libre]. free

'And Ana hopes [Ø] will leave her the weekends free.'

#### (30) a. ✓Strict reading And Ana hopes that María's boss will leave her the weekends free.

b. ✓Sloppy reading And Ana hopes that her (own) boss will leave her the weekends free.

#### 7 Overt vs. null subjects in infinitival constructions in Colombian Spanish

As already mentioned with regards to (25), postulating that the silent embedded subject is a pronominal incorrectly predicts that (29) should be unambiguous, allowing only the strict reading in (30a), exactly as is the case for (31) where an overt pronoun has been substituted for the silent subject.

	- a. ✓Strict reading
	- b. ✗Sloppy reading

The lack of ambiguity in (31) has the same logical explanation as above: There is no prior discourse antecedent with which the pronoun could corefer that would allow the sloppy interpretation in (30b). In contrast, substituting the full R-expression *su jefa* for the silent subject does give rise to the intended sloppy reading, as illustrated in (32):

(32) Y and Ana Ana espera hopes [que that [su poss jefa] boss le cl.3sg.dat dejará leave.fut.3sg los the fines ends de of semana week libre]. free

'And Ana hopes her (own) boss will leave her the weekends free.'


As already pointed out with respect to (25b), the ambiguity of (29b), on a par with that of (32), suggests that null arguments can be null complex R-expressions. Based on this observation, Duguine (2013, 2014) proposes that null (finite) subjects in Spanish (and other languages such as Basque) are elided DPs. Under this view, the interpretations in (30a) and (30b) come out as strict and sloppy readings (respectively), arising under ellipsis. Duguine's analysis builds on previous literature on null arguments in East-Asian languages such as Korean or Japanese. As is indeed well-known, null arguments in these languages allow non-pronominal interpretations (cf. Oku 1998, Kim 1999, Saito 2007, Takahashi 2014). This is illustrated in Japanese (33), where the null object in the second conjunct can yield not only a pronominal/strict, but also an anaphoric/sloppy reading:

	- a. ✓Strict reading Ken defended Taroo.
	- b. ✓Sloppy reading Ken defended himself.

The sloppy reading in (33b) indicates that the null object has the properties of a locally bound anaphor. Sloppy readings represent a major challenge for analyses of null arguments assuming *pro* as a primitive – e.g. Chomsky (1982), Rizzi (1986) – since in the configuration in (33b), the null pronominal would be locally bound, violating Binding Condition A (see (34a)). This problem disappears, however, with an analysis in terms of ellipsis – here ellipsis of an anaphor, as indicated in (34b) (cf. Oku 1998, Kim 1999, Saito 2007, Takahashi 2014):

	- b. ✓Strict reading *Ken -wa* Ken-top zibun-o self-acc *kabatta*. defended

In sum, the availability of sloppy readings for the null subjects of *sin*-infinitives provides a compelling argument for a DP-ellipsis analysis of NOC.

#### **3.2 Parallelism conditions on DP-ellipsis**

Building on Fox's (2000) Conditions on NP-Parallelism, Duguine (2013, 2014) develops an analysis of argument ellipsis which accounts for the varieties of readings null arguments give rise to. Under this account, an elided DP must satisfy either of the conditions in (35):

	- a. have the same referential value (Referential Parallelism), or
	- b. be linked by identical dependencies (Structural Parallelism).

#### 7 Overt vs. null subjects in infinitival constructions in Colombian Spanish

The Conditions on DP-Parallelism will provide a formal account of the strict and sloppy readings that NOC null subjects display. The relevant piece of data in (25) and its possible interpretations in (26) are repeated in (36)–(37):

	- b. Y and Ana Ana también also dejó stopped de of trabajar work.inf [sin without [Ø] decirle say.inf.3sg nada]. nothing

'And Ana also stopped working without saying nothing to her.'

	- b. ✓Sloppy-like reading And Ana also stopped working without her own boss saying nothing to her.

Now, to derive the strict reading in (37a), the condition on Referential Parallelism must be satisfied. The latter requires the null subject in (36b) to share the same referential value as its discourse antecedent, *su jefa* in (36a). We can achieve this by postulating the representation in (38b) where the pronoun *ella* 'her', corefering with its discourse antecedent *su jefe*, undergoes ellipsis:

	- b. Y Ana también dejó de trabajar [sin ella decirle nada].

On the other hand, to derive the sloppy reading in (37b), the condition on Structural Parallelism must be satisfied. The latter requires identical binding dependencies across both (36a) and (36b). We can achieve this by postulating the derivation in (39b): the complex DP *su jeja* occupying the embedded subject position and containing a pronoun bound by the matrix subject undergoes ellipsis, yielding the surface structure in (36b) under the reading in (37b).

	- b. Y Ana también dejó de trabajar [sin [su jefa] decirle nada].

(39a) an (39b) display identical anaphoric dependencies between the matrix subject, the possessive pronoun and the (dative) clitic, thus satisfying the condition on Structural Parallelism and licensing ellipsis in (39b).

#### **3.3 Interim conclusion**

The DP-ellipsis analysis of non-obligatorily controlled null subjects in *sin*-infinitives elegantly predicts that they allow sloppy readings otherwise unavailable with overt pronouns. It also automatically explains why overt subjects in *sin*infinitives freely alternate with referentially free null subjects. That is to say, ellipsis presumes this alternation to be possible in the first place: DP-ellipsis targets overt DPs and, as such, can only occur where overt DPs can surface, in the same way that VP-ellipsis can only occur where overt VPs can surface.

### **4 Obligatory Control: the overt vs. covert PRO puzzle** *(para***-infinitives)**

§2.3 showed that null vs. overt PRO in *para*-infinitives converge in all but one of their properties: While under both the ellipsis and focus particle tests the former only allows BV interpretations, the latter also allows coreferential readings but under the association-with-focus test only. This is summarized in Table 2.


Table 2: Null vs. overt PRO in para-infinitives.

We now tackle the questions raised in §2.4, regarding how to reconcile our contradictory findings and solve the paradox that these unexpected results give rise to, repeated in (40) (see also (24)):

	- (i) Both null and overt PRO only allow BV interpretations under the ellipsis test.
	- (ii) Overt PRO, unlike null PRO, also allows coreferential interpretations under the association-with-focus test.

Why do null PRO and overt PRO pattern differently (yield conflicting results) with respect to the two standard tests for pronominal interpretation? In particular, why does overt PRO, unlike null PRO, also allow coreference interpretations, 7 Overt vs. null subjects in infinitival constructions in Colombian Spanish

and why so only under the association-with-focus test, but not under the ellipsis test?

First of all, it should be pointed out that the conflicting patterns of interpretation that arise with overt PRO also arise with overt anaphors such as *himself* in English or *se* in French. Indeed, both have also been reported to allow coreferential readings alongside BVA under the association-with-focus test. This state of affairs is illustrated in (41) through (44), taken from Sportiche (2014) (based on a remark due to M. Prinzhorn about German).<sup>12</sup>

	- b. Seul only Pierre Pierre se cl.3 rase. shave 'Only Pierre shaves himself.'

That both statements are ambiguous between a BV and a coreferential reading is shown by the fact they can be denied in two different ways:

(42) a. No, I shave myself too. b. Non, no moi I aussi also je I me cl.1sgacc rase. shave 'No, I shave myself too.' →VP property: x(x shave x) *BVA* (43) a. No, I shave him too. b. Non, no moi I aussi also je I le cl.3sgacc rase. shave

→VP property: x(x shave y) with y = Pierre *Coreference*

Furthermore, Sportiche (2014) points out that VP-ellipsis does not give the same result as association with *only* since it does not allow a coreferential interpretation for anaphors. For instance, the second conjunct in (44a) or (44b) below cannot be interpreted as meaning "Pierre shaved Jean":

(44) a. Jean s'est rasé et Pierre aussi. Jean cl.3.is shaved and Pierre also 'Jean shave himself and Pierre did too.'

'No, I shave him too.'

b. Jean shaved himself and Pierre did, too.

<sup>12</sup>For discussion of these issues, see Büring (2005) and references therein, as well as Footnote 13.

The contrast above suggests that the conflicting patterns of interpretation that overt PRO yields is not a surprising state of affairs, but rather appears to reflect a more general property of overt anaphors, be it self-anaphors or overt PRO. We thus put forth the following generalizations to account for the Overt vs. Covert PRO paradox in (45):<sup>13</sup>

	- (i) Both null and overt anaphors need to be syntactically bound.
	- (ii) Overt anaphors can be semantically bound, null anaphors *must* be semantically bound.

Example (45) requires null anaphoric expressions such as PRO to be both syntactically bound (that is, coindexed with a c-commanding DP) and semantically bound (that is, interpreted at LF as a variable bound by a predicate abstractor/ operator), while only enforcing syntactic binding for overt anaphors, which include self-anaphors like *se* and *himself*, as well as overt PRO. Consider first the ellipsis context in (21), repeated here as (46):

	- a. ✓Sloppy reading *(BVA)* María left in order for María to be happy.


(ii) John defended himself better than Peter. (strict or sloppy)

As Hestvik points out, the crucial factor at play in subordinated ellipsis –but lacking in coordinated ellipsis– is that the matrix antecedent of the reflexive on the strict reading c-commands the ellipsis site and, as such, can bind the anaphor in the elided VP.

This contrast is in keeping with the generalization in (45(i)) since the anaphor in the ellipsis site can satisfy the syntactic binding requirement in (ii) but not in (i).

<sup>13</sup>Büring (2005: 141) discusses the wrong prediction made for reflexives in association-with-focus constructions (namely, that they should only allow BV construals, contrary to fact), concluding with a suggestion similar in spirit to ours: "As far as I know, this wrong prediction has not been addressed in the pertinent literature. The only immediate way to capture this behavior would seem to be to reformulate Binding Condition A so as to require that reflexives be *either* semantically or syntactically bound within their local domain, accepting the fact that Binding Conditions A and B are simply not on a par". This suggestion, however, does not carry over straightforwardly to ellipsis contexts where, as discussed by Büring, the pattern is more complex. Roughly, strict readings are generally impossible in coordinated ellipsis (i), but possible in subordinated ellipsis (ii), a generalization advocated by Hestvik (1992):

b. ✗Strict reading *(Coreference)* María left in order for Juan to be happy.

Both overt and null PRO only allow the BV construal in (46a). This is accounted for by the syntactic binding requirement in (45(i)) which anaphors, whether they are overt or not, are required to satisfy. The sloppy/BV reading is available because the representation in (47a) satisfies (45(i)) since in each conjunct, the matrix subject locally binds (c-commands) the embedded overt/null PRO.

	- b. ✗Juan se fue [para él /[Ø] estar feliz] y María también <se fue [para él /[Ø] estar feliz]>.

In contrast, the strict/coreference reading in (46b) is unavailable because the representation in (47b) fails to satisfy (45(i)) since the overt/null PRO embedded in the second conjunct is not bound (c-commanded) by the matrix antecedent in the first conjunct.

We turn next to the association-with-focus paradigm in (22)–(23), repeated below. This time, (45(i)) does not filter out the coreferential reading in (48) since both overt and null PRO can be bound by the matrix antecedent, thus satisfying (45(i)):

(48) Sólo only María María hizo made trampa trap [para for ella /[Ø] she ganar win.inf el the primer first lugar]. place 'Only María cheated in order for herself to win the first place'.

(45(ii)), however, is at play here, explaining why null and overt PRO cannot be interpreted alike in this context:

	- a. ✓*BVA*

No, Daniela also cheated in order for herself to win. Daniela (λy (y also cheated in order for y to win)).

b. ✗*Coreference*

No, Daniela also cheated in order for María to win. Daniela (λy (y also cheated in order for her to win)) (her= María)

	- a. ✓*BVA* No, Daniela also cheated in order for herself to win. Daniela (λy (y also cheated in order for y to win)).
	- b. ✓*Coreference*

No, Daniela also cheated in order for María to win. Daniela (λy (y also cheated in order for her to win)) (her= María)

Example (50) with an overt PRO subject in the *para*-clause can be denied either on its BV construal in (50a) or on its coreferential construal in (50b). In contrast, (49) with a null PRO instead can only be denied on its BV construal in (49a) but not on its coreferential construal in (49b). This contrast establishes that the coreferential reading is available for overt PRO but not for null PRO. It follows, moreover, automatically under (45(ii)): null PRO, unlike its overt counterpart, must be semantically bound, and thus obligatorily interpreted as a BV. In contrast, overt PRO can but need not be semantically bound and, as such, is free to corefer with its DP antecedent *María*.

### **5 Conclusion**

We started off by exemplifying how adjunct clauses in CS instantiate three different but systematic patterns of exception to the generalizations commonly assumed to hold of infinitival subjects (as stated in (1)):


We explored a novel DP-ellipsis analysis of NOC (*sin*-infinitives), providing compelling arguments in favor of ellipsis over the alternative silent pronoun (*pro*) analysis (advocated by Hornstein 1999 for NOC). In particular, DP-ellipsis elegantly predicts that non-obligatorily controlled subjects allow sloppy readings otherwise unavailable with overt pronouns and automatically explains why overt subjects in *sin*-infinitives freely alternate with referentially free null subjects.

The null subjects of *para*-infinitives display all the signature properties of OC PRO. Interestingly, overt pronouns in the same position, while also displaying diagnostic properties of OC, yield dissonant results with respect to the tests for pronominal interpretation. Under the association-with-focus test (but not the ellipsis test), a NOC-like pattern emerges, with overt PRO (but not null PRO) allowing coreferential readings, alongside BV readings. We suggested that this state of affairs appears to reflect a more general property of overt anaphors, be it overt PRO or self-anaphors. We argued that the conflicting patterns of interpretation that covert vs. overt PRO yield in *para*-infinitives follow from the *Anaphor Generalizations* in (45), which require null anaphors to be syntactically and semantically bound while enforcing only syntactic binding for overt anaphors.

### **Acknowledgements**

We would like to thank María Arche, the audiences at LSRL50, WOSSP16, BLINC3, Spadlsyn, and two anonymous reviewers for their useful comments. This work was partially funded by the following projects: AThEME (Advancing the European Multilingual Experience) funded by the European Seventh Framework Programme for research, technological development and demonstration grant agreement no. 613465, Région des Pays de la Loire; UV2 ANR18-FRAL0006 (ANR-DFG); PGC2018-096870-B-I00 (MICINN & EAI); IT769-13 (Eusko Jaurlaritza); BIM ANR17-CE27-0011 (ANR).

### **References**


Büring, Daniel. 2005. *Binding theory*. Cambridge: Cambridge University Press. Chomsky, Noam. 1981. *Lectures on Government and Binding*. Dordrecht: Foris.


7 Overt vs. null subjects in infinitival constructions in Colombian Spanish


Livitz, Inna. 2011. Incorporating PRO: A defective goal analysis. *NYU Working Papers in Linguistics* 3. 95–119.


Torrego, Esther. 1998. Nominative subjects and pro-drop INFL. *Syntax* 1. 206–219. Williams, Edwin. 1980. Predication. *Linguistic Inquiry* 11. 203–238.

Zagona, Karen. 2002. *The syntax of Spanish*. Cambridge: Cambridge University Press.

## **Chapter 8**

## **Oblique DOM and co-occurrence restrictions: How many types?**

Monica Alexandrina Irimia<sup>a</sup>

<sup>a</sup>University of Modena and Reggio Emilia

This paper examines co-occurrence restrictions involving oblique dom in (standard and leísta) Spanish and Romanian. Even a limited set of data reveals at least six puzzles, some of which are novel, ranging from differences in the syntactic behavior of oblique dom on clitics as opposed to full DPs to unsystematicity of repair strategies. It is shown that the *narrow local* domain where the relevant ([person]) features are licensed plays a role in these patterns, beyond the split Agree/Case.

#### **1 Oblique DOM and co-occurrence restrictions**

A defining trait of several Romance languages is the presence of object splits, under the broader phenomenon known as *differential object marking* (dom). The particular dom subtype we are concerned with here uses oblique morphology (henceforth *oblique* dom).<sup>1</sup> For example, in (standard) Spanish (1) or Romanian (3) a human D(irect) O(bject) DP needs to be introduced by a preposition, as opposed to the inanimate DOs in (2) or (4). The split extends to DO clitics too, as documented for leísta Spanish, with the contrast in (5) vs. (6) from Ormazabal & Romero (2007; ex. 15a, b, adapted).

(1) Vi see.pst.1sg **\*(a)** dat=dom la the niña. girl

<sup>1</sup> See Bossong (1991, 1998), Torrego (1998), Cornilescu (2000), Aissen (2003), Rodríguez-Mondoñedo (2007), Tigău (2011), López (2012), Ormazabal & Romero (2013a), Manzini & Franco (2016), Hill & Mardale (2021), a.o. We assume an accusative syntax for oblique dom.

<sup>&#</sup>x27;I saw the girl.'

(2) Vi see.pst.1sg **(\*a)** dat=dom el the libro. book (Spanish) 'I saw the book.' (3) Nu neg văd see.1sg **\*(pe)** loc=dom nimeni. nobody 'I can't see anybody.' (4) Nu neg văd see.1sg **(\*pe)** loc=dom copaci. trees (Romanian) 'I can't see trees.' (5) Lo cl.3m.sg.acc vi. see.pst.1sg 'I saw it/him.' (6) **Le** cl.3m.sg.dat=dom vi. see.pst.1sg (Leísta Spanish) 'I saw him.'

A salient, although less discussed, property of oblique dom are the co-occurrence restrictions it gives rise to. For example, Ormazabal & Romero (2007)<sup>2</sup> have shown that Clobl=dom 3 bans the presence of an I(ndirect) O(bject) dative clitic, as in (7b).

(7) Leísta Spanish (Ormazabal & Romero 2007; ex. 16a, b, glosses adapted)


<sup>2</sup> See also Bleam (2000), Zdrojewki (2008), Ormazabal & Romero (2013a, 2013b, 2013c) among others.

<sup>3</sup> In order to individuate oblique dom on clitics (as in (6)) from oblique dom on full nominals (as in (1) or (3)), we encode the former as Clobl=dom and the latter as DPobl=dom. We also collapse the locative and the dative under the broader category 'oblique'.

Co-occurrence restrictions provide important insights into the nature of dom. However, even in the initial, pioneering observations, it became immediately clear that they are not uniform. This paper touches on precisely this issue. The contribution is two-fold; on the empirical side, it is interested in the landscape of these phenomena, using (standard and leísta) Spanish and standard Romanian.<sup>4</sup> Even a limited set of data reveals at least six puzzles, some of which are novel. Besides the differences between (leísta) Spanish Clobl=dom and DPobl=dom (Ormazabal & Romero 2007, §2 and §3), we touch on other problems such as: i) differences in the behavior of possessor vs. goal dative clitics with Romanian dom (§3); ii) splits between DPobl=dom and dom negative quantifiers (§4); iii) lack of systematicity of accusative clitic doubling as a repair strategy on Romanian dom (§4). On the theoretical side, §3 and §4 also show that the split Agree/Case is not sufficient to derive the data. §5 explores the proposal that the *narrow local* domain where the relevant ([person]) features need to be licensed plays a role in these types of co-occurrence restrictions. §6 contains the conclusions.

#### **2 Oblique DOM and the PCC**

In a pioneering analysis of co-occurrence restrictions triggered by oblique dom, Ormazabal & Romero (2007) reduced the ungrammaticality of examples such as (7b) to principles behind the better known P(erson) C(ase) C(onstraint) or *Me-Lui* phenomena. Across Romance, the latter have been extensively discussed for clitic clusters, following seminal work by Perlmutter (1971) and Bonet (1991).<sup>5</sup>

The standard Spanish examples below illustrate the so-called *strong* PCC. The ungrammaticality of (9a) is triggered by the DO (direct object) clitic that has a person feature (1st) which is hierarchically higher than the person feature of the IO clitic (3rd), as schematically summarized in (8). The ungrammaticality is avoided in (9b), as this time the DO is 3rd person, while the IO is 1st person.

	- a. \* Pedro Pedro **le/se** cl.3sg.dat **me** cl.1sg.acc envía. send.3sg.subj Intended: 'Pedro sends me to him.'

<sup>4</sup>The data come from native speaker judgments, and from 20 native speaker consultants each for Spanish and Romanian, and 3 for leísta Spanish.

<sup>5</sup> See also Albizu (1997), Anagnostopoulou (2003), Béjar & Rezac (2003), among others.

b. Pedro Pedro **me** cl.1sg.dat **lo** cl.3sg.acc envía. send.3sg.subj 'Pedro sends him/it to me.'

Although initial accounts investigated a morphological explanation for the (strong) PCC, subsequent research (Albizu 1997, Anagnostopoulou 2003, Béjar & Rezac 2003, Preminger 2019, among others.) underpinned its clear *syntactic* source. A general idea in syntactic accounts has been that the PCC involves more than one category which requires *licensing* in the syntax, in a local configuration containing just one relevant licenser. To briefly cite two analyses, for Anagnostopoulou (2003) 1st and 2nd persons contain a [person] feature, which requires licensing just like the [person] feature introduced by all (inflectional) datives. Béjar & Rezac (2003) similarly assume an obligatory person licensing condition affecting speaker and hearer-related categories.

Ormazabal & Romero (2007, 2013a, 2013b, 2013c) follow the premises of intervention based syntactic accounts for PCC to explain co-occurrence restrictions induced by Clobl=dom as in (7b). The reasoning goes as follows: Differential morphology on the DO clitic in (7b) signals grammaticalized animacy, which requires *obligatory licensing* via object agreement. A constraint is active which prohibits the verb from entering into other agreement operations, besides object agreement. This is formalized as the O(bject) A(greement) C(onstraint) in (10):

(10) OAC (Ormazabal & Romero 2007:50): If the verbal complex encodes object agreement, no other argument can be licensed through verbal agreement.

In fact, for Ormazabal & Romero (2007, 2013a, 2013b, 2013c), the OAC is the unifying factor behind all types of PCC. In oblique dom, grammaticalized animacy requires obligatory licensing but is relevant on all persons (including 3rd person), and thus will block *any* type of inflectional dative (clitic), which equally requires licensing. Moreover, the hypothesis that grammaticalized animacy, signalled by oblique dom, requires special syntactic licensing appears to find support elsewhere. For example, in Romanian a DPobl=dom results in ungrammaticality (for all the consultants in this study) in a context which also contains a Cldat interpreted as a possessor, irrespective of the person specification of the latter, as in (11a). Grammaticality is restored if oblique dom is removed (11b).

(11) Romanian: \*Cldat=poss DPobl=dom (dom blocked under possessor Cldat)


#### **3 Some problems**

However, several problems immediately became apparent. If grammaticalized animacy, which requires obligatory licensing, is what triggers oblique dom, one reasonable assumption is that DPobl=dom in (1) (from standard or leísta Spanish) should also trigger PCC effects. But this expectation is *not* (fully) borne out.

In (12a) we see that DPobl=dom is well formed with Cldat (irrespectively of the latter's person feature). This contrasts with examples like (7b), repeated in (12b).

(12) Spanish: Oblique dom PCC on full nominals vs. clitics

a. !*Te/me* cl.2/1sg.dat enviaron send.pst.3pl **a** dat=dom todos all los def enfermos. sick people.m.pl (Leísta/Standard)

'They have sent all the sick people to you/me.'

```
b. * Te/me
     2/1cl.dat
                 le
                cl.3m.sg.dat=dom
                                     di.
                                     give.pst.1sg
                                                                       (Leísta)
     7
```
Intended: 'I gave him to you/me.'

DPobl=dom is also possible with an IO DP introduced by the (dative/locative) preposition *a*, 8 as in (13a) from Ormazabal & Romero (2013a). Crucially, in both leísta and standard Spanish, DPobl=dom becomes *ungrammatical* with an IO DP which is also *doubled* by a dative clitic. Thus, the example in (13b) is grammatical (to the speakers tested here) only if the differential marker is removed.

(13) a. ! Enviaron send.pst.3pl **a** dat=dom todos all los the enfermos sick people.m.pl *a* dat *la* the *doctora* doctor 'They have sent all the sick people to the doctor.'

<sup>6</sup>Clacc (*-l)* doubling the DPobl=dom in (11a) triggers alternation in the shape of Cldat=poss in (11). 7 In standard Spanish (and Romanian) 3rd person DO clitics only allow accusative morphology; thus, they do not grammaticalize animacy. Leísta varieties allow both DPobl=dom and Clobl=dom. <sup>8</sup>Thus, indicating that the effect is not due to haplology (the need to avoid two *a-* sequences).

b. *Le* cl.3dat enviaron send.pst.3pl (\***a**) loc/dat=dom todos all.m.pl los def.m.pl enfermos sick people.m.pl *a* dat *la* def.f.sg *doctora*. doctor Intended: 'They have sent all the sick people to the doctor.'

Complex problems are the norm in Romanian, too. In (11a) DPobl=dom is ungrammatical with a Cldat=poss. But there are (at least) two twists in the data. On the one hand, other types of dative clitics are tolerated by DPobl=dom. The sentence in (14) contains a *goal* dative clitic and a DPobl=dom and is *grammatical*, irrespectively of the person of the former:

(14) Romanian – !oblique DOM with goal dative clitic !*Mi/ţi/i* cl.1/2/3sg.dat (l)-au cl.3msg.acc-have prezentat introduced **pe** loc=dom student. student 'They have introduced the student to me/yousg/him.'<sup>9</sup> (cf. 11a)

On the other hand, there are also configurations where a Cldat-doubled IOdat *outputs ungrammaticality* with DPobl=dom, even if the former is interpreted as a goal. In (15) we present a relevant example from Cornilescu (2020). In a sense, such sentences mirror the Spanish one in (13b), with *a difference*: In Romanian, "PCC effects" arise when DPobl=dom *binds into* the Cldat doubled IO (cf. 14/fn.9).

(15) Romanian (Cornilescu 2020, ex. 4; glosses adapted) Comisia board.def.f.sg (\**le*)-a cl.3pl.dat-has repartizat assigned **pe** loc=dom mai more mulţi<sup>i</sup> many.m medici medical rezidenţi residents *unor* some.dat.pl foşti former.m profesori professors de-ai of lor<sup>i</sup> . theirs Intended: 'The board assigned several medical residents to some former professors of theirs.'

Besides the removal of dom (a repair strategy equally available in standard and/or leísta Spanish), Romanian provides a second repair strategy for examples such as (15), namely accusative clitic doubling of DPdom. <sup>10</sup> This is seen in the *grammatical* sentence (16a), which contains a Cldat-doubled IO, and a DPobl=dom which is *clitic doubled* using the accusative form of the clitic (cf. 15). A puzzle, however, is that Clacc-doubling of DPobl=dom is *not* a repair strategy in contexts

<sup>9</sup> In these contexts a DPdat is also possible: *I<sup>i</sup> (l)-au prezentat pe student profesoruluii[professor.dat]).*

<sup>10</sup>Not all varieties of Spanish allow clitic doubling of DPobl=dom. See further remarks in §5.

that contain a dative clitic interpreted as a possessor, no matter whether a possesor dative DP is also present or not. Example (11a) is adapted here as (16b).

	- a. ! Cldat DPdat<sup>i</sup> … Clacc DPobl=dom<sup>i</sup> (Cornilescu 2020, ex. 6; adapted) Comisia board.def.f.sg *i* cl.3sg.dat *l*-a cl.3sg.m.acc-has repartizat assigned **pe** loc=dom fiecare<sup>i</sup> each rezident resident *unei* some.dat.sg.f foste former.f.dat profesoare professor.f.dat a lk lui<sup>i</sup> . his 'The board assigned each resident to a former professor of his.'
	- b. \* Cldat=poss (DPdat) … Clacc DPobl=dom *\* I*-*l* cl.3sg.dat-cl.3m.sg.acc ajută help.3sg **pe** loc=dom prieten friend (lui dat.3sg.m Ion). Ion Intended: 'He helps his/Ion's friend.'

In general, as we can see from these limited sets of data, the co-occurrence restrictions on oblique dom are extremely complex and still uncharted. A modest goal here is, first of all, empirical - trying to map which domains are relevant, and where the cross-linguistic similarities and differences are to be found. Let us first summarize the five (related) puzzles we have identified (see also Table 1):

	- Puzzle<sup>1</sup> : Assuming that DPobl=dom grammaticalizes animacy, it should trigger a PCC effect with dative clitics, similarly to Clobl=dom. Why is this prediction not borne out? What is the reason for this contrast, which we repeat in (18)?
	- Puzzle<sup>2</sup> : Why does Spanish DPobl=dom produce a PCC effect with an IO which is doubled by a dative clitic, as represented in (19)?
	- Puzzle<sup>3</sup> : Why does the restriction under Puzzle<sup>2</sup> obtain in Romanian (only) when DPobl=dom binds into a Cldat-doubled IOio, as summarized in (20)?

$$\begin{array}{c} \text{(20)} \quad \text{Puzzle}\_3 \text{:} \, ^\star \text{Cl}\_{\text{DATA}} \, \text{DP}\_{\text{DATA}} \dots \, \text{DP}\_{\text{OBL=DOM\_i}} \text{ (Romanian 15)} \,\text{vs.}\\ \qquad \Big\prime \text{Cl}\_{\text{DAT}} \, \text{DP}\_{\text{DATA}} \dots \, \text{DP}\_{\text{OBL=DOM}} \text{ (Romanian 14)} \end{array}$$

• Puzzle<sup>4</sup> : Why is Cldat=poss distinct from other dative clitics in that it triggers PCC effects in interaction with DPobl=dom in Romanian?

$$\begin{array}{c} \text{(21)} \quad \text{Puzzle}\_4 \text{:} \, ^\ast \text{Cl}\_{\text{DATA}=\text{POSS}} \dots \, \text{DP}\_{\text{OBL=\text{DOM}}} \text{ (Romanian 11a, 16b) vs.}\\ \bigvee \text{Cl}\_{\text{DATA}=\text{GOAL}} \dots \, \text{DP}\_{\text{OBL=\text{DOM}}} \text{ (Romanian 14)} \end{array}$$


#### **4 Agree vs. Case**

Previous work has mostly been concerned with Puzzle<sup>1</sup> , namely the contrast between Clobl=dom which gives rise to PCC effects with Cldat in 7b (12b), and DPobl=dom, which does not (12a, 14). As mentioned in §2 and §3, Ormazabal & Romero (2007, 2013a, 2013b, 2013c, et subseq.) attribute the ungrammaticality of examples like 7b (12b) to the OAC in (10). Grammaticalized animacy spelled out by Clobl=dom in 7b (12b) requires obligatory object agreement on the verb, blocking the licensing of any other argument through verbal agreement. Thus, Cldat, which equally needs licensing, remains unlicensed causing ungrammaticality.

But, then, what is the status of grammaticalized animacy on full nominal dom (DPobl=dom), which is equally signaled via oblique morphology? Ormazabal & Romero (2007: 338) provide the following explanation for this contrast: "whatever rule or principle is involved in A-insertion (*in DPobl=dom, our note*) it has to be independent of object agreement." In later works, Ormazabal & Romero (2013a) associate Clobl=dom in (12b) with licensing in terms of Agree, while DPobl=dom (i.e., prepositional *a-*DOM, as in 1 or 12a) involves licensing in terms of Case.

The Agree/Case divide can also, potentially, explain why examples such as (13a) are *grammatical*. The intuition is that the IO DP introduced by the preposition *a* ('a la doctora') does not have a Case feature (it is a lexical dative, instead). Thus, it cannot compete for Case licensing with the Case feature in oblique dom on full nominals. In (13b), instead, the IO DPdat is doubled by a dative clitic. The latter contains a Case feature, which competes for licensing with the Case feature in DPobl=dom, introduced by the *a*-preposition. This is puzzle<sup>2</sup> .

#### 8 Oblique DOM and co-occurrence restrictions: How many types?

In §2 and §3 we have also seen the data are truly complex and refined. The question is whether we can extend the split Agree/Case to all the patterns examined here. One problem is Puzzle<sup>4</sup> from Romanian, which sets aside the dative possessor clitic from other types of dative clitics, as repeated in (23).

$$\begin{array}{c} \text{(23)} \quad \text{Puzzle}\_4 \text{:} \, ^\ast \text{Cl}\_{\text{DATA}=\text{POSS}} \dots \, \text{DP}\_{\text{OBL=DOM}} \text{ (Romanian 11a, 16b) vs.}\\ \bigvee \text{Cl}\_{\text{DATA}=\text{GOAL}} \dots \, \text{DP}\_{\text{OBL=DOM}} \text{ (Romanian 14)} \end{array}$$

Here, the explanation would have to be that Cldat=poss needs licensing in terms of Agree, while other dative clitics either stay unlicensed or require licensing in terms of Case (or the other way around). The non-trivial question is what independent evidence would motivate this assumption. Similarly problematic is the contrast between (14) and (15). In what sense is this a matter of Case vs. Agree?

There is yet another complex issue regarding the licensing of DPobl=dom in terms of Case. A less discussed fact is that not all types of DPobl=dom trigger co-occurrence restrictions. For example, dom-ed *Neg(ative) Q(uantifier)s* (more easily) escape them. This is clearly seen in the contrast in (24) from Spanish. In Romanian, the data are even more subtle. If NegQobl=dom might be problematic to some speakers with *assign/distribute*-type predicates (Class A), irrespectively of binding, as in (25b), *introduce*-type predicates (Class B) seem to be fine in (25c), as expected. But then, if oblique dom and clitic doubled datives compete for Case, leading to PCC in (24a) and (25a), why is the PCC avoided in (24b)?

	- b. No neg *le* cl.3sg.dat enviaron send.pst.3pl **a** dat=dom nadie nobody *a* dat *la* the *doctora*. doctor 'They haven't sent anybody to the doctor.' (Spanish)

<sup>11</sup>As Cornilescu (2020) also notices, the problem is not the putative absence of clitic doubling on DPdom. DPdom is grammatical without clitic doubling for all the speakers consulted here.

b. ? Comisia board.def nu neg *i*-a cl.3sg.dat-has repartizat assigned **pe** loc=dom nimeni nobody *profesorului*. professor.def.dat 'The board hasn't assigned anybody to the professor.'<sup>12</sup> c. Comisia board.def nu neg *i*-a cl.3sg.dat-has prezentat introduced **pe** loc=dom nimeni nobody *profesorului*. professor.def.dat 'The board hasn't introduced anybody to the professor.' (Romanian)

Assuming that differential marking on NegQs is not active syntactically is a non-starter. NegQobl=dom *is* blocked under other configurations which do *not* permit differential marking. One such case is the medio-passive se. The two examples below are ungrammatical in both Spanish (26a) and Romanian (26b).

(26) DOM under medio-passive *se*: Spanish and Romanian


Moreover, in Romanian, NegQobl=dom is still ungrammatical in a structure which contains a dative clitic interpreted as a possessor. In (27), we have forced a possessor reading of the dative clitic (i.e., *he didn't help anybody of his*). The consultants judge this example ungrammatical/degraded, contrary to (25c).


<sup>12</sup>As dom is obligatory on *nimeni*, the only repair here is the removal of Cldat double (-*i*). Also, (25a) and (25b) show that these co-occurrence restrictions are not simply a matter of DPobl=dom binding into Cldat-doubled IO; NegQobl=dom is not involved in such operation in (25b).

In order to explain such examples, NegQobl=dom will need to be Case licensed in some contexts (26, etc.), but caseless in others (24b, etc.). We thus have yet another problem, as summarized under Puzzle<sup>6</sup> :

• Puzzle<sup>6</sup> : Why does NegQobl=dom (more easily) escape the PCC in configurations involving clitic doubled IOdat, as summarized in (28)?

$$\begin{array}{c} \text{(28)} \quad \text{Puzzle}\_6 \text{:} \sqrt{\text{Cl}\_{\text{DATA}} \text{DP}\_{\text{DATA}} \dots \text{Neg Q}\_{\text{DOM}}} \text{ (24b, 25c)}\\ \text{\*} \text{Cl}\_{\text{DATA}} \text{DP}\_{\text{DATA}} \dots \text{DP}\_{\text{DOM}} \text{ (13b, 25a)} \end{array}$$

In Table 1 we summarize the six puzzles. In §5 we explore a solution which (also) takes into account the *position* in which a certain category needs licensing.


Table 1: Six puzzles

#### **5 Oblique DOM and licensing positions**

#### **5.1 Oblique DOM and the possessor dative**

As the facts are clearer, let's start with the problems involving the possessor clitic. Puzzle<sup>4</sup> showed that oblique DPobl=dom is ungrammatical in configurations which contain a Cldat=poss, as in (11a), repeated in (29a). However, there are further quirks in the data. For all speakers, such structures significantly improve or are perfectly grammatical if DPobl=dom is dislocated to the left periphery, as in (29b). Moreover, if Cldat is not interpreted as a possessor on DPobl=dom, the structure is again grammatical. This is illustrated in (29c), where the possessor is interpreted on the PP-adjunct.<sup>14</sup> In fact, a possessor reading on DPobl=dom would be ungrammatical, as already shown in (27).

	- a. \**Şi<sup>i</sup> /\*mi<sup>i</sup>* -(l) cl.3sg.refl.dat/1sg.dat-cl.3m.sg.acc ajută help.3sg **pe** loc=dom prieten*<sup>i</sup>* . friend

Intended: 'He is helping his own/my friend.'


This also implies that local, narrow domains *do* matter. As mentioned, we follow accounts which link oblique dom to a specification beyond Case. For simplicity, we encode it as a [person] feature (Cornilescu 2000, Rodríguez-Mondoñedo 2007, Richards 2008), which needs obligatory licensing in the syntax. The dative possessor clitic also encodes a [person] feature, which equally needs licensing. The data also indicate that this is a type of dative possessor clitic which is generated DP-internally and then raises to its spell-out position.<sup>15</sup>

<sup>14</sup>For lack of a more adequate notation, we indicate this connectedness via a subscript index. <sup>15</sup>Landau (1999), Diaconescu (2004), a.o.

The more specific problem with examples such as (29a) is that the two [person] features are *too local* in the same KP, as represented in Figure 1. Additionally, in the local domain that contains these two [person] features, there is only one [person] licenser, on the functional projection we label here (following López 2012). Crash can be avoided, if one of the [person] features can be removed from this local domain, for example via dislocation to/direct merge in the left periphery, as in (29b). Here, the [person] feature can be licensed by a [person]-related functional projection in the C<sup>0</sup> domain, while the [person]-related specification in the possessor clitic is licensed by <sup>1</sup> head. Another possibility is to have the two [person] features on different categories, as in (29c); here, as schematically shown in Figure 2, the *Possessor*-related [person] feature is generated inside the PP, while the object DP contains a separate [person] feature. As we show in §5.3 and §5.4, depending on the narrow domain in which each of these [person] features is checked, crash can be avoided.<sup>16</sup>

Figure 1: [person] categories too local

<sup>16</sup>Onea & Hole (2017) and Onea (2018) derive ungrammaticality in examples like (29a) on the hypothesis that both oblique dom and the possessor clitic need licensing in a position above VP. As we see in this paper, this seems to be too coarse; there are instances (e.g., 29c in the relevant interpretation) where these two categories do not produce ungrammaticality, indicating that some other factor is at play too.

Figure 2: [person] categories in separate domains

#### **5.2 Puzzle<sup>1</sup> : Oblique dom on clitics and interaction with IO clitics**

Thus, the *position* in which [person] features are licensed is relevant. But this begs the question about possible [person]-licensing positions. The literature contains a variety of proposals. As already mentioned above, López (2012) assumes that (oblique) dom is licensed<sup>17</sup> in an intermediate position between VP and 0 , <sup>18</sup> which we denote by <sup>1</sup> 0 . Yet, Belletti (2005), Ciucivara (2009), and Stegovec (2020), among others, have identified a [person] (animacy) licensing field above P, which is especially relevant for clitics. A third explicit proposal is that oblique dom on DPs has 0 as a licenser (Rodríguez-Mondoñedo 2007, a.o.).<sup>19</sup> The three [person] licensers are illustrated in Figure 3. Importantly, what the data at hand show is that *all* these positions and licensers are relevant in their own way.

Let's turn now to the [person] field above P. We assume that this area is involved in the licensing of oblique dom on clitics, as seen in leísta varieties of Spanish. Puzzle<sup>1</sup> is precisely concerned with the ungrammaticality of an oblique dom clitic in the context of an IO clitic. Crucially, this effect does not arise when a full nominal is differentially marked. We repeat the relevant examples in (30):

<sup>17</sup>Note that for López (2012), oblique dom involves licensing in terms of Case. As we have outlined some shortcomings of this hypothesis, we take dom to involve the licensing of a ([person]) feature beyond Case. This way we obtain better results both empirically and formally.

<sup>18</sup>One important piece of evidence for a licensing position below 0 comes from the absence of binding effects into the EA from DPobl=dom (see López 2012: 41–46 for exemplification).

<sup>19</sup>Of course, a [person]-licensing field is also available in the CP. In fact, there are Romance varieties where DPdom is only possible on XPs that are overtly dislocated to the left periphery. See especially Belletti (2018) for Italian or Escandell-Vidal (2009, et subseq.) for Balearic Catalan.

(30) \quad \text{PuzzleLie}\_{1}: \text{"{C}}\_{\text{DAT}} \dots \text{Cl}\_{\text{OBL}=\text{DOM}} \text{ (Leista Spanish 7b, 12b) vs} \\ \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \text{LP}\_{\text{DAT}=\text{DOM}} \text{ (Spanish, Romanian 12a, 14)} 
$$\text{a. \* } \text{Te/me} \quad \underbrace{\boxed{\text{le}}}\_{2/1 \text{CL.DA7 \text{ CL.3M.SG.DAT}=\text{DOM}}} \text{ di.} $$
 
$$\begin{array}{llll} \text{Intendded: 'I gave him to you/me'} \\ \text{b. } \text{'Te/me} & \text{envianon} \quad \underbrace{\boxed{\text{a}}}\_{\text{}} \qquad \text{todos los emfermos.} \\ \text{c. 2/1 \text{s}} \text{/cs.nart send.} \text{\_TDA.\_ 'T=\text{DOM} \text{ all} \quad \text{the sick people.m.p.} \\ \text{'They have sent all the sick people to you/me'} \end{array} $$

As the PCC effects induced by Clobl=dom are different from those of DPobl=dom and given the problems with an analysis under the split Case/Agree, let's see what we obtain as a result of licensing position. As DPobl=dom gets licensed in <sup>1</sup> , it must be the case that Clobl=dom is licensed in a different position. We propose that this is the [person] domain above P, what we abbreviate as <sup>2</sup> 0 (see the tree in Figure 3).<sup>20</sup> The problem with (30a) is that the same local <sup>2</sup> domain also hosts the dative clitic encoding a [person] feature equally needing licensing. As there is only one [person] licenser available, namely <sup>2</sup> 0 , the derivation will crash, as in Figure 6. On the other hand, the two [person] features in (30b) can be licensed by two licensers found in different domains, as in Figure 4: <sup>2</sup> for the [person] feature in the dative clitic, and <sup>1</sup> for the [person] feature in DPobl=dom. This latter structure is also seen with Romanian ditransitives as in (14) (remember that these are either Class B verbs or configurations in which DPobl=dom does not bind into the IOdat).<sup>21</sup> Therefore, we also have part of the answer to Puzzle<sup>3</sup> .

#### **5.3 Oblique dom and clitic doubled datives**

Let's see now the explanation to Puzzle<sup>2</sup> which involves ungrammaticality of DPobl=dom with a dative IO which is clitic doubled by dative clitic (in Spanish and in Romanian configurations where DPobl=dom binds into the clitic doubled dative, see Table 1). In these structures Figure 8, DPobl=dom contains a [person] feature needing licensing. Dative clitic doubling involves the introduction of a [person] feature on the (low) Appl head,<sup>22</sup> which equally needs licensing. As there is only one licenser available, namely <sup>1</sup> 0 , the derivation will crash.

<sup>20</sup>Tests similar to the ones alluded to in fn. (16) actually show that Clobl=dom can be found above the EA, as opposed to DPobl=dom, which is found below 0 .

<sup>21</sup>As expected, binding from the IO into DPobl=dom *is* possible, indicating that in these configurations the IO is higher (and if containing a [person] feature, it has an independent licenser for it, which does not interact with oblique dom).

<sup>22</sup>The evidence discussed by López (2012: 41–46) indicates that DPobl=dom binds into the IO, and not the other way around. Thus here the DPobl=dom is (interpreted) higher than the IO.

Figure 3: [person] licensing positions

In Romanian such configurations have a repair strategy which consists in accusative clitic doubling of DPdom, as in (16a), part of Puzzle<sup>5</sup> . The PCC effect is avoided as accusative clitic doubling, which involves the licensing of a [person] feature in <sup>2</sup> , removes oblique dom from the domain of <sup>1</sup> (see also Cornilescu 2020).<sup>23</sup> Thus <sup>1</sup> 0 can license the [person] feature on the clitic doubled dative,

<sup>23</sup>Clitic doubled dom allows binding into the EA (as opposed to DPdom which is not clitic doubled),

Figure 4: [person] licensing above *v*P

as shown in Figure 5. As in dative possessor contexts, nominal dom and the dative clitic are *too* local in the KP on first merge,<sup>24</sup> accusative clitic is not a repair strategy, and examples like (16b) are ungrammatical.

#### **5.4 DOM on negative quantifiers**

Let's turn now to Puzzle<sup>6</sup> . The question is why NegQobl=dom can avoid a PCC effect with clitic doubled datives as opposed to DPobl=dom in examples like (31):

(31) Puzzle<sup>6</sup> : !Cldat DPdat … Neg Qdom (24b, 25c) vs \*Cldat DPdat(i) … DPdom(i) (13b, 15) No neg *le* cl.3sg.dat enviaron send.pst.3pl **a** dat=dom nadie nobody *a* dat *la* def.f.sg *doctora*. doctor (Spanish)

'They haven't sent anybody to the doctor.'

indicating a position above P. See also Hill & Mardale (2021), a.o., for discussion.

<sup>24</sup>Only movement/direct merge in the CP (29b) can break this too local relationship. This indicates that C<sup>0</sup> introduces its own [person] zone, separate from the [person] zone below it.

Figure 5: DOM doubling by accusative clitic above *v*P

Figure 6: Two [person] categories to be licensed by α<sup>2</sup> – clash

Figure 7: Licensing of DOM animate negative quantifiers

Although the explanation is more tentative, one possibility is to relate this to intrinsic properties of NegQobl=dom, which trigger raising higher than 0 . For one, NegQobl=dom carries emphatic accent, related to a focus feature,<sup>25</sup> which forces raising. Therefore, animate NegQ has its accusative Case (and subsequently its [person] feature) licensed by 0 ; [person] on clitic-doubled datives is licensed by <sup>1</sup> 0 , as shown in Figure 7. This, however, would predict that examples like (25b) should always be grammatical. Although none of the consultants judged (25b) as ungrammatical as (25a), for some speakers these examples were not fully perfect either. Therefore, further research is clearly needed into this point, as well as into the more precise difference between Class A and Class B verbs (25b vs. 25c) and the effect of binding.

<sup>25</sup>See Giannakidou (2020), a.o., for discussion.

Figure 8: Two [person] categories to be licensed by α<sup>1</sup> – clash

Finally, raising to Spec, 0 is not a repair strategy in contexts such as (27) for the same reasons mentioned above. And it does not work under medio-passive se in (24a) or (24b) either; semp involves the removal of structural *accusative* case.

In Table 2 we summarize the results obtained in this section.

#### **6 Conclusions**

This short paper has examined co-occurrence restrictions with oblique dom from (leísta and standard) Spanish and Romanian. The complexity and richness of an otherwise rather limited set of data give rise to six puzzles, which prove hard to reduce just to the split Agree/Case. We have found that an important factor behind these patterns is also the *narrow local* domain where the relevant [person] features are licensed. Obviously, oblique dom is part of many other co-occurrence restrictions, for example with variants of the Pan-Romance se, begging the question of how all these effects can be further unified.

#### 8 Oblique DOM and co-occurrence restrictions: How many types?


#### Table 2: Six puzzles and their explanations

#### **Abbreviations**


### **Acknowledgements**

We would like to thank Ion Giurgea as well as the audiences at LSRL 50 and SLE 2021 for the discussion and very useful feedback. All errors are our own.

#### **References**


Monica Alexandrina Irimia


## **Chapter 9**

## **A superlative challenge for a syntactic account of connectivity sentences**

Nicoletta Loccioni<sup>a</sup>

<sup>a</sup>University of California Los Angeles

In this paper, I present two sets of data that challenge the "question plus deletion"(Q+D) approach to connectivity. The first set of data comes from Romance data where superlative import requires relativization, whereas the second set has more generally to do with relative clauses in subject position of specificational sentences. The problem comes down to what follows. Under Q+D, a conflict emerges between the assumed syntax of the post-copular clause and its interpretation. That is, the structural configuration required to satisfy Binding or NPI licensing cannot generate the desired (superlative) interpretation, at least not without relying on mysterious implicatures. The same problem does not arise for revisionist accounts, which maintain that variable binding does not require c-command and can therefore straightforwardly derive the correct meaning.

#### **1 Introduction**

Higgins 1973 convincingly showed that copular sentences like (1) are not the same as the ones in (2). Whereas the former clearly involve predication (the property of being really long is predicated of the subject referent in (1)), the latter do not. Rather, (2) seem to involve valuing of a variable introduced by the pre-copular subject. That is, the subject expression sets up a variable, and the post-copular expression *My Brilliant Friend* provides the value for such variable, in a similar fashion to the question/answer pair in (3). These cases are normally referred to as specificational sentences, and they can consist of a pseudo-cleft (as in 2a), or the pre-copular phrase can be a relative clause (shown in 2b).

Nicoletta Loccioni. 2023. A superlative challenge for a syntactic account of connectivity sentences. In Barbara E. Bullock, Cinzia Russi & Almeida Jacqueline Toribio (eds.), *A half century of Romance linguistics: Selected proceedings of the 50th Linguistic Symposium on Romance Languages*, 181–194. Berlin: Language Science Press. DOI: 10.5281/zenodo.7525108

	- a. What Betta read is *My Brilliant Friend*.
	- b. The book that Betta read is *My Brilliant Friend*.
	- b. *My Brilliant Friend*.

A well-established fact about specificational sentences like (2) is that they exhibit connectivity effects (see Akmajian 1970, Higgins 1973 and many others since). That is, they behave like their connected counterparts (the (b)-examples below) with respect to a variety of syntactic tests including principle A, B and C of Binding Theory, Pronominal binding, NPI licensing and opacity. The same connectivity effects do not hold if the post-copular element is read as predicational.

	- a. [The only person Narcissus likes] is himself .
	- b. Narcissus only likes himself .
	- a. [The only person he/∗ likes] is Narcissus .
	- b. He only likes Narcissus .
	- a. [The only person every Italian cares about] is his mother.
	- b. Every Italian only cares about his mother.

The (a)-examples above are puzzling because Binding is normally taken to require syntactic c-command between the binder and the bindee. However, one can clearly see that in each of the (a)-examples the post-copular DP is not superficially c-commanded by the relevant DP. Yet, in (4a) the anaphor is bound by *Narcissus*, in (6) co-reference between the pronoun and *Narcissus* is impossible, and in (5) the quantified expression binds the post-copular pronoun.

The examples (7)–(9) show that question-answer pairs exhibit the same type of connectivity. Nothing overtly c-commands the anaphor in (7) for example. Yet, Principle A is somehow satisfied.

9 A superlative challenge for a syntactic account of connectivity sentences


Two main lines of analyses have been proposed to account for connectivity effects: a conservative/syntactic line (see Ross 1972, den Dikken et al. 2000, Schlenker 2003, Romero 2007, 2018 among others) and a revisionist/semantic one (see Jacobson 1994, Sharvit 1999, Cecchetto 2000 among others).

Among the syntactic approaches, the one I will focus on is the *Question plus deletion (Q+D)* account as developed by Schlenker (2003) and Romero (2007, 2018). According to the proponents of this approach, the specificational sentence (10a) displays the same behavior as (10b) simply because at some level of representation, (10a) contains a connected clause like (10b), which is then partially elided, as informally shown in (11).

	- b. Narcissus/everybody likes himself

It is in the syntax of this partially elided clause that binding is satisfied. Thus, once one takes into account the reconstructed clause, connectivity is straightforwardly accounted for. What about the interpretation? How is the meaning of a connectivity sentence derived under Q+D?

Under this account as presented by Schlenker (2003) & Romero (2007), the specificational subject is analyzed as a concealed question, the post-copular constituent is treated as a partially elided answer, and *be* simply denotes the identity function between the two.

Specificational subjects are analyzed as concealed questions whether or not they superficially look like questions. That is, not only are English pseudoclefts taken to denote questions, but so are other specificational subjects that do not resemble questions at all. This includes nominal relative clauses like (12), which will be the main focus of this paper.

(12) The person Narcissus loves is himself.

#### Nicoletta Loccioni

For these nominal subjects, Schlenker (2003) suggests that *the* spells out the definiteness feature of a concealed *wh-word* in a similar fashion to the object of *know* in (13).

#### (13) John knows the capital of Italy. (Schlenker 2003)

The relevant reading of (13) does not say that John knows Rome. It asserts that John knows *what the capital of Italy is*. That is, he knows the answer to the question *What is the capital of Rome?* Similarly, the pre-copular element in (12) is taken to have the same denotation as the question *who is the person Narcissus loves?*, where – crucially – the definite description is read as predicational.

Here is how the meaning of (12) is derived by the Q+D as it currently stands in the literature. Schlenker (2003) adopts Groenendijk & Stokhof's (1984) semantics for the meaning of the specificational subject. In Groenendijk & Stokhof's (1984) semantics, the denotation of a question is its exhaustive true answer in the world w. In order to implement that, Romero (2007) assumes that the specificational subject is equipped with a silent answer operator ans, defined in (14). ans's job is to turn the intension of the nominal into a question meaning.

$$\text{(14)}\quad \left[\text{Ans}\right] = \lambda\\\text{y}\_{<\text{s},e>} \lambda\\\text{w.}\lambda\\\text{w.'}\\\text{y}\\(\text{w}') = \text{y}\\(\text{w})$$

Thus, (12) has the LF in (16a). The question denoted by the pre-copular phrase is equated to the *strengthened* value of the partially elided answer. The strengthened value of the answer is equal to its normal semantic value *plus* its implicature of exhaustivity arising from focal stress. In this case, the right-hand side of the equation includes the value in (15b), as well as the implicature triggered by focal stress on *himself*, shown in (15c). The implicature's contribution is that all the other alternatives to *himself* in (15a) are negated and – as a results – we interpret (15a) to mean that Narcissus only likes *himself*. (12) then has the meaning in (16b), which has the rough paraphrase in (17).

	- b. Assertion:w.like(n,n,w)
	- c. Implicature: w.∀x:x≠n → ¬like(n,x,w)
	- b. w [ w'. x [like(n,x,w')] = x [like(n,x,w)] = w'. like(n,n,w') & *Narcissus likes nobody else in w'* ]

(17) Paraphrase of (16a)

'We are in a world w such that: the exhaustive answer to the question "who is (the) person Narcissus like" in w is the proposition "that Narcissus likes himself (and nobody else)"'.

I will now turn to the other main line of analysis, the *semantic or revisionist accounts* (Jacobson 1994, Sharvit 1999, Cecchetto 2000 among others). Revisionists do not take Binding (Scope or NPI licensing) to require a structural condition like c-command, at least not in specificational sentences. For this reason, they do not need to posit elided structure in the post-copular constituents. Connectivity instead results from a higher-order semantics (supplemented with rule I of Reinhart (1983)). Sharvit (1999) for example appeals to quantification over functions and analyzes the reflexive as an identity function. Under her approach, the example in (12) has the LF in (18a) and the semantics in (18b). It denotes the equation between the unique function that maps Narcissus to the person he likes and the function that maps everyone to themselves.

(18) a. J[The person Narcissus likes] is [himself]K b. w.f<,> [∃x.like(n,f(x),w)] = x.x

In this paper, I will show that certain specificational subjects pose a challenge to proponents of the syntactic *Q+D* account. In §2, this problem is first illustrated using Romance data where superlative import requires relativization. The issue boils down to what follows. Under a syntactic account, a conflict emerges between the required syntax of the post-copular clause and its desired interpretation. That is, the structural configuration that *Q+D* requires to satisfy Binding (or more generally to explain connectivity) cannot generate the intended superlative interpretation. A similar problem does not arise for semantic accounts, which maintain that variable binding does not require c-command and can, therefore, straightforwardly derive the correct meaning. In §3, I then show that the challenging Romance data is part of a larger and more general problem with specificational subjects that contain relative clauses. It has to do with the fact that – under Q+D – the information in the head of the relative clause get lost in the reconstructed post-copular clause.

The goal of this paper is merely to highlight the existence of this challenge for a Q+A analysis and spell out what the desiderata of a promising syntactic account are. §4 includes a short discussion of those desiderata. The proposal of a new syntactic account is left to future research.

#### **2 Superlatives in Romance**

In this section, I show how Romance data involving superlative phrases present a novel interesting challenge for Q+D accounts. The problem clearly arises in cases like (19) where the pre-copular phrase is a definite relative clause that embeds a superlative, and it boils down to what follows. The level of representation that satisfies binding does not yield superlative import and vice versa.

(19) La the persona person con with cui whom Lenuccia Lenuccia è is più more gentile kind è is se stessa. herself 'The person Lenuccia is the kindest to is herself.'

Appreciating that there is a puzzle here requires some background on relative readings of superlatives and how those readings are obtained in Italian (and other Romance languages). For the purpose of this paper, discussion of one specific example will be enough. For a more detailed investigation of the phenomenon in Romance, I refer the interested reader to Loccioni (2018).

First, Romance lacks the morphological distinction between *more* and *most*. In both comparatives and superlatives, Italian makes use of the morpheme *più* that I will gloss as 'more' throughout the paper. In addition to this comparative morpheme, superlative import requires the presence of some definite marker. In the absence of a definite marker, the only available interpretation is a comparative one, as shown by (20).


It will not be surprising then that (22) and (23) can only have a comparative interpretation.


#### 9 A superlative challenge for a syntactic account of connectivity sentences

However, in Italian and other Iberico-Romance languages, embedding the comparative phrases above inside a definite relative clause can generate superlative import.


What the contrast between (22) and (24) shows is that relativization is necessary to get the desired superlative interpretation in these examples.<sup>1</sup>

With these facts about Romance at hand, I can now go back to the problem of connectivity in specificational sentences. The crucial data point for my argument will be one where a relative clauses like (24) is plugged in the pre-copular position of a specificational sentence which exhibits connectivity.

Below are the relevant examples which show connectivity effects with respect to (i) Principle A, (ii) Principle C and (iii) Pronominal binding.<sup>2</sup>

(26) [ La the persona person con with cui whom Maria Maria è is più more esigente demanding ] è is se stessa herself 'The person with whom Maria is the most demanding is herself'

	- a. Maria is more demanding with Lucia/herself than anybody else is.
	- b. \* Maria is more demanding with Lucia/herself than she is with anybody else.

Adding a definite determiner to (23) results in a sharply ungrammatical sentence.

(ii) \* Piero Piero ha has letto read i the più more libri books

<sup>2</sup>Whereas (26) and (27) are fully acceptable, pronominal binding sentences such as (28) are subject to a greater extent of speaker variation. I personally find them less than perfect.

<sup>1</sup>Adding a local definite determiner adjacent to *più* instead of embedding the comparative inside a definite relative clause would not work. It would either provide the wrong interpretation or it would be ungrammatical. In the case of (i), the resulting sentence could only have the absolute (a)-interpretation, which is different from the one I am after in (24).


Let's turn to how a *Q+D* account explain these cases. Take (26) for example.<sup>3</sup> If the post-copular phrase is a partially elided clause, binding is easily accounted for. As shown in (29), Maria c-commands the anaphor *se stessa*.

(29) [ la persona con cui Maria è più esigente ] [ the person with whom Maria is most demanding ] è is [ Maria è più esigente con se stessa ] [ Maria is more demanding with herself ]

However, the syntactic configuration that is able to satisfy Condition A of Binding Theory fails to generate the right superlative meaning. As discussed above, superlative import in the specificational subject rests on the fact that the superlative is embedded in a definite relative clause. It is only in that environment that *more esigente* (lit. "more demanding") can have a superlative interpretation. Once relativization is undone in the partially elided post-copular constituent, the superlative interpretation becomes unavailable, and the only possible reading for

(i) La the persona person con with la the que which María María es is más more exigente demanding es is ella misma herself 'The person with whom Maria is the most demanding is herself'

French superlatives do not have the same properties as Iberico-Romance languages such as Italian and Spanish. For this reason, French does not have the exact counterpart of (26). In order to construct the relevant example, one could use what I called *de*-free relatives. As far as I know they are the only French predicative constructions where a superlative interpretation can arise without determiner doubling (see Loccioni 2018 for discussion):

(ii) Ce this que that Jean Jean a has de *di* plus more précieux, valuable c'est it.is lui-même himself 'The most valuable thing Jean has is himself'

<sup>3</sup>A very similar example (with virtually the same properties) can be built for Spanish:

*more esigente* is a comparative one. The contrast between the meaning of the precopular and post-copular phrases is illustrated by the translation in (29).

What a defender of the Q+D account would have say is that the correct truthconditions result from the computation of the relevant implicature (that arises from focal stress). That is, once the implicature is factored in, the desired interpretation is obtained. In order for it to work, they would have to maintain that in (29) not only does the answer assert that Maria is more demanding with herself but also *implicates* that there is nobody else she is more demanding with (than she is with herself). (30) shows the semantics of the assertion and the implicature, using the degree-based lexical entries of *demanding with*, *er* and *est* provided in (31).

	- b. Assertion: w.∃[demanding(m,m,d,w) ∧ ¬demanding(m,g(1),d,w)]
	- c. Implicature: w.∃d (demanding(m,m,d,w) & ∀y∈ Y [y≠x → ¬(demanding(m,y,d,w)])

$$\text{(31)}\quad \text{a. } \left\[ \text{demanding-with} \right\] = \lambda \mathbf{w}. \lambda \mathbf{d}. \lambda \mathbf{x}. \lambda \mathbf{y}. \text{demanding(y,x,d,w)}$$

$$\text{b.} \quad \left[\text{-er}\right] = \lambda \text{P}\_{}. \lambda \text{Q}\_{}. \exists d[\text{Q}(d) \land \neg \text{P}(d)] \qquad \text{(Bhatt \& Takahashi 2011)}$$

c. J-estK = Y<,>.P<,>.x<>.∃d (P(x,d) & ∀y∈ Y [y≠x → ¬P(y,d)]) (Heim 1999)

Examaples (30b) show that the normal value of the answer to the question *Who is Maria the most demanding with?* is that Maria is more demanding with herself than she is with some relevant other (g(1) in (30b)) refers to this contextually relevant standard of comparison). The strengthened value of (30a), however, is that Maria is more demanding with herself than she is with anybody else. It is unclear how this implicature would work. How can focal stress on the anaphor be responsible for gap between a simple *than*-clause and a quantified one (more than anybody else)?

A revisionist account *à la* Sharvit (1999) would not run into the same problem. (26) would have the LF in (32a) and would denote the equation between the unique function that maps Maria to the person she is the most demanding with and the function that maps everyone to themselves, as shown in (32b).

	- b. w.f<,> [∃d.∃x.demanding(m,f(x),d,w) & ∀y∈ Y [y≠x → ¬(demanding(m,y,d,w)]] = x.x

In the next section, I will show that this problem is not limited to relative clauses containing superlatives. Rather, it is part of a larger issue that a Q+A account runs into when dealing with relative clauses in subject position.

#### **3 A larger problem with relativization**

The problem discussed in the previous section was somewhat language-specific. It had specifically to do with the fact that in languages like Italian, relativization can be a necessary element to get superlative import in certain environments. In specificational sentences then, a tension emerged between the desired structure needed to account for connectivity and the meaning of the construction.

In this section, I want to show that a similar challenge for Q+D accounts arises when relative clauses are specificational subjects more generally. It has to do with the fact that the two clauses that are equated do not contain the same level of information. Since the right-hand side of the equation has the syntax of a connected clause, it only contains part of the pre-copular phrase meaning. In particular, the information included in the head of the relative clause is absent in the post-copular clause.<sup>4</sup>

To show that, take the English sentences in (33). Under the Q+D account as it currently stands in the literature, these sentences are analyzed as in (34). One can easily notice that despite the different meanings of the subjects, they all share the very same connected sentence on the right-hand side, *he cared about John's mother*.

	- b. The oldest woman he cared about is John's mother
	- c. The youngest woman he cared about is John's mother
	- d. The last sick person he cared about is John's mother
	- b. [The oldest woman he cared about] is [he cared about John's mother]
	- c. [The youngest woman he cared about] is [he cared about John's mother]
	- d. [The last sick person he cared about] is [he cared about John's mother]

<sup>4</sup>The same applies to elements in the left-periphery of the DP, such as demonstratives and ordinal numbers.

#### 9 A superlative challenge for a syntactic account of connectivity sentences

In order to make the computation work and derive the correct interpretation, the meaning gap between the specificational subject and the post-copular clause has to be filled. According to the Q+D account, this role is played by the implicature triggered by focus on *John's mother*. Thus, the meaning of 'he cared about John's mother' has to be strengthened differently in each case to derive the corresponding exhaustive answers, as shown in (35)

	- b. He cared about John's mother *and no woman older than* that
	- c. He cared about John's mother and *no woman younger than that*
	- d. He cared about John's mother and *no other sick person after her*

Again, it is unclear how these implicatures work. In (35a), focus on *John's mother* would trigger the implicature that there is nobody else he cared about. However, the implicature triggered in the other examples are fairly different. In theses cases, the presupposition of the superlative requires that there are other people he cared about. But they have to fit a certain description which is provided by the head of specificational subject. (33b) for example triggers the implicature that he did not care about any woman who is older than John's mother whereas the implicature in (33c) has quite the opposite effect. The fact that this information is "retrieved" by an implicature seems quite undesirable. It would be much preferable if the postcopular phrase contained this information. Yet, we don't want to end up with something like (36), where in the right-hand side we obtain the connectivity sentence we started with.

(36) The oldest woman he cares about is [ his mother is the oldest woman John cares about ]

In the discussion of a similar case, Schlenker (2003) recognized that the precise way in which the implicature is calculated is unclear. His analysis of (37) is that *(He worries about) himself* triggers the implicature that there is no problem that John worries about more.

(37) The worst problem that John worries about is himself

He writes: "How the implicature comes about is unclear (focus seems to be crucial); but once it is assumed that such an implicature exists in (37), the mechanism developed [...] to equate the value of the pre-copular element with the strengthened value of the post-copular element will presumably yield the desired results." But where does the degree scale come from in this example? Why would the assertion *John worries about himself* generate the implicature that he is his worst problem?

#### **4 Desiderata for a syntactic account and conclusion**

I am afraid that this paper is going to disappoint the reader who is looking for a solution to rescue a syntactic account of connectivity sentences. Even though I am not able to offer a solution at this point, I would like to spell out what (some of) the desiderata of a more promising syntactic account are, given what we have learned.

A good syntactic account that does not want to abandon the c-command tests needs to have a good story for what a "connected" elliptical structure with the right interpretation looks like.

For fragment sentences, Merchant (2004) suggests that ellipsis is preceded by an A'-movement to a clause-peripheral position.

	- b. FP [DP John i [.F' F < [TP she saw t ] > ]] (adapted from Merchant (2004))

Given the similarities between specificational sentences and question/answer pairs, one could explore whether a similar account would work in the case of post-copular phrases of specificational sentences. It would translate into moving [ himself ] to some focus position – either the high left periphery position posited by Rizzi (1997) or the low left periphery argued for by Belletti (2001).

(39) What Narcissus likes is [ himself [ Narcissus likes t ] ]

If the post-copular phrase is moving from a connected sentence, connectivity effects are easily accounted for. However, other issues arise. First, Italian strongly disallows preposition stranding, which would result from such derivation. (40) provides an example of it.

(40) L'unica persona con cui Maria è esigente è [ se stessa [ Maria è esigente con t ] ]

Second, the data I presented in §2 showed that the post-copular phrase should have some (form of) maximalization as part of the derivation to derive the correct superlative interpretation. Even though topicalization has very obvious effects on the information structure, it is not enough to get superlative import in Italian. This is illustrated in (41), which can only have the (a)-interpretation.

9 A superlative challenge for a syntactic account of connectivity sentences

	- a. WITH HERSELF MARY is more demanding.
	- b. \* WITH HERSELF MARY is the most demanding.

Third, §3 showed that the derivation should have some mechanism to "remember" the content of the head of the relative clause in the specificational subject. This means that, even though (42b) accounts for binding, it cannot be the full story for (42a).

(42) a. The oldest person John cares about is himself

b. [The oldest person John cares about] is [himself John cares about t ]

It seems therefore unlikely that a simple movement akin to topicalization could be extended from fragment answers to post-copular phrases of specificational sentences.

Lastly, any satisfactory syntactic approaches should also make sure to avoid the circularity of obtaining another specificational sentence on the right-hand side.

(43) The oldest person John cares about is [ himself is the oldest person John care about ]

#### **References**


## **Chapter 10**

## **Revisiting sociophonetic competence: Variable spectral moments in phrase-final fricative epithesis for L1 & L2 speakers of French**

### Amanda Dalola<sup>a</sup> & Keiko Bridwell<sup>b</sup>

<sup>a</sup>University of Minnesota <sup>b</sup>University of Georgia

Phrase-final fricative epithesis (PFFE) is a phenomenon in Continental French in which utterance-final vowels lose their voicing and yield fricative-like whistles corresponding to the identity of the host vowel. PFFE is also well attested among L2 speakers; differences in its production across speaker groups have been reported in several domains: vowel type, speech rate, register, constituent location, fricativevowel ratio (FVR), and measures of center of gravity (COG). Participants completed a reading task targeting 98 tokens of /i,y,u/ in phrase-final position. 4569 PFFE segments were examined via 7 frames of 8 ms in length with 2 ms in overlap across their trajectories. Results suggest categorical speaker group differences for /y/ in terms of skewness, as well as for kurtosis and for intensity at higher FVRs. This suggests that L1/L2 sociophonetic realizations contain nuanced differences far beyond the presence/absence paradigm still common to many variationist inquiries.

### **1 Introduction**

Phrase-final fricative epithesis (PFFE), a phenomenon also known in the literature as phrase-final vowel devoicing (PFVD), refers to a well-attested phenomenon in Continental French (CF) in which breath group-final vowels lose their voicing and produce a short burst of high-frequency aperiodic energy akin to a

fricative, e.g. *mais oui-hhh* [mεwiç], *merci beaucoup-hhh* [mεKsibokux] (see Figure 1). The first linguistic description of this phenomenon described it as the emergence of "sharp, phrase-final whistles" (Fónagy 1989); subsequent research witnessed a split in nomenclature, with North American researchers often opting for a name focusing on voicing loss—"vowel devoicing / *dévoisement vocalique*" (Fagyal & Moisset 1999, Smith 2002, 2003, 2006, Martin 2004) and most European researchers preferring a name focusing on the emergence of the downstream fricative – "fricative epithesis / *épithèse (consonantique) fricative*" (Fagyal 2010, Candea 2012, Candea et al. 2013). Because the present study will focus on characterizing the spectral and durational qualities of the emergent fricative, we, the (North American) authors, have explicitly chosen to heed the call of our European predecessors in adopting the term "fricative epithesis" for this discussion, a precedent we first established in Dalola & Bridwell (2020).

Figure 1: PFFE on the spectrogram: *venu* 'came'. The PFFE corresponds to the final, highlighted segment, which is characterized by the lack of a voicing band on the spectrogram and the presence of aperiodic energy in the waveform. The PFFE is immediately following the articulation of the vowel [y], which is characterized by full formant structure on the spectrogram and periodic energy in the waveform.

In the first description of PFFE in the literature, Fónagy (1989) hypothesized that not only do the characteristic phrase-final fricatives appear immediately following vowels that have lost a portion of their voicing band, but they themselves might also correspond to their host vowel phonetically in terms of their backness

dimension. Citing the *ich-Laut / ach-Laut* harmony phenomenon in standard German, in which the backness value of a voiceless fricative is selected by the backness value of its preceding vowel, he hypothesized that the fricatives epithesized after the high front vowels /i/ and /y/ in French should be more [ç]-like, i.e. front, than those appearing after high back [u], which should be more [x]-like, i.e. back. This observation was corroborated by Dalola (2015a) who examined measures of center of gravity (COG: average peak frequency) taken at the 1/4, 1/2 and 3/4 timepoints of PFFE fricatives produced by L1 CF speakers and found evidence to suggest a three-way distinction in spectral energy at the first two timepoints. These spectral differences, however, could not be characterized in terms of sheer [+/- back] and did not persist into the second half of the segment.

#### **1.1 Phonological Predictors of PFFE**

The best-studied dimension of PFFE is undoubtedly its phonological distribution. Originally described as occurring in high vowels (Fónagy 1989), PFFE has been documented in the full inventory of French vowels, including nasals (Smith 2006), but has been reported at the highest rates following the high vowels /i,y,u/ (Fagyal & Moisset 1999, Martin 2004, Smith 2003, 2006). When comparing reading passages, role-plays and impromptu conversation, PFFE has been found to occur at significantly higher rates in types of reading, i.e. planned, speech (Fagyal & Moisset 1999, Dalola 2014). This finding is perhaps explained by its higher rates of occurrence at the ends of both the intonation phrase and the declarative phrase (Fagyal & Moisset 1999, Smith 2003) where French sees the emergence of a low tone. Studies have also found an effect for the manner type of the preceding consonant, such that preceding stops condition PFFE at a significantly higher rate than more sonorous manner types, in addition to lexical frequency effects, which report more frequent lexical items as more likely to exhibit the phenomenon than less frequent ones (Dalola 2015b). Dalola & Bridwell (2019, 2020) also found that PFFE varies with the proportional duration of the fricative relative to the full vowel (i.e. FVR, as described in §2.4), such that among high front vowels, longer fricatives are produced with higher COGs, indicating that epithesized fricatives may be conditioned by the same principles of hypo- and hyperarticulation (Lindblom 1990) as other phonological segments.

#### **1.2 Social Predictors of PFFE**

The social distribution of PFFE presents a complex series of macro- and microgroup associations. Early work often described PFFE as occurring in the speech

of women (Fónagy 1989, Fagyal & Moisset 1999, Smith 2006); however, later work has reported the variable to be used at similar rates among both men and women (Candea 2012, Candea et al. 2013, Dalola 2014). Fagyal & Moisset (1999), who took a categorical approach to age, found the variable at its highest rates among their youngest (16–35) and oldest (61–85) groups; Dalola (2014), who operationalized age continuously (testing ages 13–83), reported participants as more likely to use PFFE the older they were. From a socioeconomic standpoint, PFFE is often associated with the French middle class (*la bourgeoisie*) (Paternostro 2008, Fagyal 2010). Originally, the variable was associated with Parisians (Fagyal & Moisset 1999, Smith 2006, Fagyal 2010), though in recent years, it has also been documented in the speech of francophones from other metropolitan centers in France, namely Lyon and Strasbourg (Dalola 2014). Further afield, the variable has also been described in the speech of French, Belgian, and Canadian news anchors (Paternostro 2008, Candea et al. 2013); one study introduced intersectionality into this association by reporting it as a characteristic of young, i.e. inexperienced, news anchors (Candea 2012). The disagreement among social predictors may be reflective of the rapidly changing landscape of PFFE users and/or the generalization of the variable from specialized demographics to less specialized ones. Despite the inconclusive findings among social predictors, it is important to recognize them in pursuing research on the characterization of the PFFE variable.

#### **1.3 Perception of PFFE**

Differences in L1 versus L2 production of PFFE ushered in a rigorous examination of potential speaker group differences in the variable's perception. Dalola (2016) reported significant differences in L1 and L2 perceptions of the PFFE, namely that L2 speakers perceived it as a marker of "formality" and "trustworthiness," whereas L1 speakers perceived it variably, sometimes as a marker of "admirability" and other times as a marker of "intense emotional affect." Using a matchedguise design and exploratory factor analysis, a related form of principal component analysis that partitions out the shared variance of each variable from its unique and error variance to reveal the underlying factor structure (Osborne & Costello 2009), Dalola's (2016) L2 participants rated users of PFFE similarly for two separate groups of adjectives: *polite, well-educated, speaks clearly, speaks formally*, a category the author refers to collectively as traits of formality, and *confident, persuasive, I respect X, I trust X*, a category the author refers to collectively as traits of trustworthiness. Like their L2 counterparts, L1 speakers also rated users of PFFE similarly for two separate groups of adjectives, however, the

adjectival members of the groups were both more numerous and compositionally different: *well-educated, professional, speaks clearly, polite, intelligent, patient, confident, persuasive, I trust X, I respect X, I believe what X says, I would like to speak like X* formed a category referred to collectively as traits of admirability, while *aggressive, bourgeois, superficial, bossy, native French speaker, speaks with emotion* formed a category referred to collectively as emotional affect. It should be noted that all the traits that made up the L2 category of formality were also present in the L1 category of admirability, and that the reason for the difference in category name was due to the author's desire to assign names that applied to the full collection of adjectives. No gender effects were found for the voices being rated, however, there was a significant gender effect among those giving ratings, such that women were more likely to assign higher ratings for the adjective profiles overall than men.

#### **1.4 L2 Speakers & PFFE**

Given its salient phonetic energy and robust distribution among native francophone populations, it is somewhat unsurprising to learn that PFFE, despite its status as a sociophonetic variable, is also readily employed by L2 French speakers (Dalola & Bullock 2017). Investigating the nature of L1 and L2 PFFE as produced in different genres of speech, they revealed subtle but nuanced differences at every level of production. For rates of use of PFFE, L1 and L2 speakers performed similarly overall but were motivated by different genres of speech: L1 speakers used more PFFE in role-plays while L2s were more likely to use it when reading wordlists. In terms of PFFE duration, or the proportional length of the epithesized fricative when compared to its host vowel, larger differences between speaker groups were documented: not only did L1 and L2 speakers produce PFFE segments that were statistically different in length (L1 PFFE length > L2 PFFE length), but each group showed sensitivity to a different linguistic parameter. L1s produced longer PFFEs as a reaction to pragmatic shifts (indicated in the roleplays via scenario-setting prompts), producing longer PFFEs in slower and formal speech, while L2s produced longer PFFEs as a reaction to task shifts (indicated by shifting from roleplays to wordlist), producing longer PFFEs in the wordlist task. Despite the various pragmatic and speaker group effects in this study, no effects were found across participants for measures of gender or age. In the first study to examine spectral differences between L1 and L2 users of PFFE, Dalola & Bridwell (2020) examined COG measures taken throughout the fricative and found L1 speakers to produce PFFE differently, as a function of phonological context, while L2 speakers instead were found to produce a singular, high-energy, hyperarticulated allophone in a majority of environments. The discovery of spectral differences in terms of COG measures, a metric commonly used for diagnosing fricatives (segments that are notoriously difficult to describe due in part to their sustained aperiodic energy), therefore, invites the investigation of the other spectral moments commonly used to talk about fricatives, namely standard deviation, skewness, and kurtosis, as well as independent measures of intensity.

#### **1.5 Spectral Moments**

The use of the four spectral moments in diagnosing the energy of fricatives is common practice in phonetic studies, as it has been reported that listeners tend to classify voiceless obstruents from spectral information during the first 40 ms of the segment (Forrest et al. 1988). The four spectral moments used to classify fricatives are individually center of gravity (COG), standard deviation (SD), skewness, and kurtosis; each is a unique statistical manipulation of the segment's energy profile meant to capture a different quantitative aspect of the aperiodic energy sustained throughout the segment. If one conceives of the various frequencies produced during a fricative in terms of a normal distribution (see Figure 2), then the COG corresponds to the measure of central tendency among the frequencies, or the spectral mean. The standard deviation is then the measure of spread, or the average variation in frequencies from the spectral mean during the segment. As with distributions, skewness corresponds to a measure of energy distribution above or below the spectral mean, or spectral tilt, while kurtosis corresponds to a measure of the distribution of energy at the tail extremities, or tailedness. In addition to the four spectral moments, this study will also take measurements of amplitude, also called intensity, which is perceived by listeners as loudness. Although not one of the four spectral moments, intensity has also been shown to be a relevant parameter for the perception of obstruents (Forrest et al. 1988).

#### **1.6 Spectral moments as cues**

Although spectral moments and measures of intensity have been shown to aid listeners in identifying and discerning voiceless obstruents in languages, they have also been found working independently or in concert with other acoustic parameters to convey social information about the speaker. Zimman (2017) reported COG to be a marker of masculine voices when considered alongside fundamental frequency (f<sup>0</sup> ). Munson et al. (2006) found the skewness of frequencies in [s] to be a relevant predictor of speaker sexuality, such that negatively skewed [s],

Figure 2: The four spectral moments, visualized as statistical transformations: a) center of gravity and standard deviation, b) skewness, c) kurtosis.

or [s] segments with a higher concentration of energy above the spectral mean, served as a marker of gay men's speech. In terms of the PFFE variable, Dalola (2016) found intensity measures to be a predictor of affect ratings assigned by L1 French listeners to PFFE users, such that higher intensity PFFEs yielded higher ratings for features of intense affect (*aggressive, bourgeois, superficial, bossy, native French speaker, speaks with emotion*). Although relevant for the perception of intense affect, intensity measures were not found to be a predictor of features of admirability (*educated, professional, speaks clearly, polite, intelligent, patient, confident, persuasive, I trust X, I respect X, I believe what X says, I would like to speak like X*) among L1 French listeners, nor were they found to be at all relevant for L2 French listeners perceiving PFFE segments.

#### **1.7 Motivation**

This article reports on production differences in spectral tendencies in PFFE among L1 and advanced L2 speakers of Continental French. Since PFFE is a sociophonetic marker in CF (Dalola 2014, 2016), it presents an interesting testing ground for comparing spectral values across native and non-native speakers. Previous work has reported production differences in rate and degree of devoicing between native and non-native French speakers (Dalola & Bullock 2017), but only a few studies have extended the comparison to investigate the phonetic quality of PFFE's variable emergent fricatives (Dalola & Bridwell 2020). This study looked uniquely at COG measures and reported significant differences across speaker groups. However, fricatives are most commonly identified by a profile of four spectral moments, including measures of COG, SD, skewness, and kurtosis, not just COG alone. Combined with the many known articulatory differences and false similarities in vowel production between French and English (the L1 of the non-native population in this/previous studies), e.g. French /u/=[u] vs. English

/u/=[uw], it is reasonable to expect that L1 articulatory behaviors may persist which could contribute to differences in spectral energy when realizing PFFE, even among advanced L2 speakers (Flege & Hillenbrand 1984, Flege 1987, 1995). The goal of this study, therefore, is to examine and characterize the fricatives epithesized after devoiced vowels using measures of fricative-vowel ratio (length of fricative divided by length of full vowel), COG (average peak frequency during PFFE segments), SD (variation in frequencies from the spectral mean during PFFE segments), skewness (energy distribution above or below the spectral mean during PFFE segments), kurtosis (energy distribution at the tail extremities during PFFE segments), and intensity (loudness of PFFE segments). We will then use inferential statistics to determine the presence of speaker group differences in PFFE in terms of measures of COG, SD, skewness, kurtosis, and intensity. Finally, we will investigate the role of vowel type and fricative-vowel ratio in modulating these speaker group differences.

#### **1.8 Research Questions & Hypotheses**

The current study puts forth the following research questions:


A previous study carried out by Dalola & Bridwell (2020) predicts significant differences across speaker group for measures of COG, a hypothesis that is maintained in the present study. Predictions will not be offered for each of the other parameters, as previous work has not yet diagnosed these aspects of the PFFE variable among L1 or L2 speakers.

### **2 Methods**

#### **2.1 Participants**

40 speakers of CF participated in the experiment, of which 31 were L1-French speakers and nine L1-English advanced L2-French speakers. All participants were recorded in Paris or Strasbourg in France or in the United States. Among the L1 participants, 23 were women and eight were men, ranging in age from 20 to 66 years (mean: 38.4 years). All L1 speakers were L2 speakers of English, having

studied it formally for four or more years and using it in interactions once a week or more. Among the L2 participants, five were women and four were men, ranging in age from 27 to 58 years (mean: 38.6 years). L2 speakers were classified as "advanced" because they had all lived in France for at least two years, had prepared or were preparing an upper-level degree in French, and used French regularly in their careers. All L2 speakers were L1 speakers of American English. All subjects gave their informed consent for inclusion before they participated in the study. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the University of South Carolina Institutional Review Board (USC IRB).

#### **2.2 Stimuli**

Inspired by the task and pragmatic effects findings of Dalola & Bullock (2017) and several studies' reports of PFFE's robustness among news anchors reading off teleprompters (Paternostro 2008, Candea 2012, Candea et al. 2013), participants were asked to complete a reading task that consisted of 106 single sentences, containing 98 phrase-final tokens of /i,y,u/, occurring after all licit (C)C(C) onset sequences in one- to three-syllable real words in French (see Table 1 for a breakdown of consonant environments).



#### **2.3 Procedure**

Participants were presented with sentences one at a time on a MacBook Pro via Microsoft Powerpoint and told to read each one aloud, imagining they were reading a story to a native francophone listener. Participants were instructed to read each sentence twice and to repeat any trials from the beginning in the event of a disfluency. As they read aloud, participants were recorded via a head-mounted unidirectional cardioid microphone (SHURE WH20) plugged into a solid-state

digital recorder (Marantz PMD 660) digitized at 44.1 kHz (16 bit). The task was completed under the direction of the L1-English advanced L2-French researcher; the task was self-paced and participants were given as much time as they needed to complete it.

#### **2.4 Measurements**

The final vowel of each target word was examined in Praat (Boersma & Weenink 2018) for the presence of PFFE, as indicated by the loss of a voicing band and the onset of high-frequency aperiodic energy (for reference, see Figure 1). All tokens exhibiting PFFE were then measured for the duration of the fricative and of the vowel itself. Since measures of raw duration do not take into account relative durational differences brought on by variable realizations of final lengthening, the derived measure of *fricative-vowel ratio (FVR)* was selected as the relevant durational variable. This measure was calculated according to the formula *FVR = F/V*, where *F* is the length of the epithesized fricative and *V* is the length of the full vowel including the fricative (Dalola & Bridwell 2020).

Time-averaged spectral measures were obtained for each of the tokens by measuring the spectra of seven frames, each 8 ms in length and overlapping by 2 ms, from the central 44 ms of the fricative and averaging across these frames (see Toda 2007, 2009). This method of averaging across multiple windows was chosen in order to reduce the effects of the random fluctuation that is inherent in aperiodic energy (Toda et al. 2010). The resulting measures included the four spectral moments of *COG*, *standard deviation (SD)*, *skewness*, and *kurtosis*, and a measure of *intensity*.

#### **2.5 Statistical Analyses**

Of the 7942 tokens collected, 4995 exhibited PFFE (62.9%). Of these, 426 (8.5%) were discarded as being too short (i.e. exhibiting under 44 ms of frication) or as spectrally unanalyzable at one or more of the timepoints, leaving 4569 tokens for analysis. Statistical analyses were conducted in R (R Core Team 2018), using the function *lmer()* from the packages *lme4* (Bates et al. 2015) and *lmerTest* (Kuznetsova et al. 2017) to run mixed-effects linear regression models for each of the five variables. This method of analysis was chosen because mixed-effects models allow for reliable comparisons between unequal sample sizes. In each model, *vowel*, *speaker group*, *FVR*, and all possible interactions between them were treated as predictor variables while *participant* was treated as a random ef-

fect.<sup>1</sup> Although normalized COG values have been used in previous studies to account for variation in vocal tract size (Dalola & Bridwell 2019, 2020), this proved to be redundant when *participant* was included as a random effect, and therefore, will not be used here. The proportion of the variance in the data explained by the random effects is reported in terms of the intraclass correlation coefficient (ICC), as calculated by the function *icc()* in the package *performance* (Lüdecke et al. 2021). Since post-hoc tests are not traditionally performed on mixed models (Levshina 2015), results below are interpreted in terms of 95% confidence intervals, as generated by the *effects* package (Fox 2003, Fox & Weisberg 2018, 2019). Visualizations were generated using *ggplot2* (Wickham 2016).

#### **3 Results**

#### **3.1 COG**

As presented in Table 2, the results of the COG model indicated a significant three-way interaction between *vowel*,*speaker group*, and *FVR*. While the complexity of this interaction makes interpreting the statistical results difficult, a visual presentation of the effects generated by this model (Figure 3) makes the patterns in the data clear. These are interpreted below, with all differences reported as statistically significant, reflecting results from 95% confidence intervals. In terms of variation present in the data, the ICC measure of 0.338 suggests relatively low levels of similarity between measurements taken from the same speaker, indicative of a high level of variability throughout.

Overall, it can be seen that COG increases as FVR increases, a pattern which is true of all vowels and both speaker groups, although the rate of increase differs across vowels. At low FVRs (i.e. 0%), /y,u/ > /i/ for L1 speakers (*p*<.05); L2 speakers exhibit no significant differences between vowels here, although /y/ trends higher than /i/ (*p*=.071). At high FVRs (i.e. 100%), /i/ > /y/ > /u/ for L1 speakers (*p*<.05); for L2 speakers, /i/ > /y,u/ (*p*<.05), although the difference between /y/ and /u/ is nearly significant (*p*=.055). At no point do L1 speakers differ from L2 speakers for a given vowel and FVR in terms of statistical significance.

<sup>1</sup>Because repeated measures were taken for each word, initial models also included *word* as a random effect. However, examination of the ICCs for each model indicated that the grouping factor of *word* accounted for a very low amount of the total variation (1–3%). Since random effects with ICCs below 0.1 do not appreciably change the model (Vajargah & Nikbakhtt 2015), our models were therefore simplified to only include *participant* as a random effect.


Table 2: Mixed-effects linear regression model for COG

Figure 3: COG regression model effects

#### **3.2 Standard deviation**

As shown in Table 3, significant two-way interactions for*standard deviation* were present between *vowel* and *FVR* and *FVR* and *speaker group* while the interaction between *vowel* and *speaker group* was nearly significant. As visualized in Figure 4, patterns of standard deviation differed across vowels: it significantly increased with FVR for /i,u/ (*p*<.05), but decreased for /y/ (*p*<.05). As a result, /i,y/ > /u/ at low FVRs (*p*<.05), but at high FVRs /i/ > /u/ > /y/ (*p*<.05). In terms of speaker group, standard deviation increased with FVR for L1 speakers (*p*<.05) but not for L2 speakers (*p*>.10). At low FVRs, L2 speakers tended toward higher standard deviations than L1 speakers (*p*=.096); at high FVRs L1 speakers tended to show higher values (*p*=.080).

The ICC measure of 0.324 suggests relatively low levels of similarity between measurements taken from the same speaker, indicative of a high level of variability throughout.


Table 3: Mixed-effects linear regression model for standard deviation

Figure 4: Standard deviation regression model effects

#### **3.3 Skewness**

As shown in Table 4, a significant interaction for *skewness* was present between *vowel*, *FVR*, and *speaker group*, an interaction which primarily centers around the results for /y/. As can be seen in Figure 5, there is a significant decrease in skewness as FVR increases (*p*<.05), except for L2 speakers' production of /y/ (*p*>.10), which stays relatively stable across FVRs. Correspondingly, predicted values for /u/ are greater than those for /i/ across FVRs (*p*<.05), but /y/ behaves differently, patterning with /i/ at low FVRs but with /u/ at high FVRs. Finally, L1 speakers show higher skewness values than L2 speakers at low FVRs for /y/ (*p*<.05). A tendency is also present for L1 speakers' skewness to be higher than that of L2 speakers at low FVRs for /i/ (*p*=.060).


Table 4: Mixed-effects linear regression model for skewness

The ICC measure of 0.324 suggests relatively low levels of similarity between measurements taken from the same speaker, indicative of a high level of variability throughout.

Figure 5: Skewness regression model effects

#### **3.4 Kurtosis**

As shown in Table 5, a trending three-way interaction for *kurtosis* was present between *vowel*, *FVR*, and *speaker group* (which later proved to be fully significant when /y/ was compared with /u/); a significant two-way interaction was also present between *vowel* and *speaker group*, and a trending interaction between *FVR* and *speaker group*.

As can be seen in Figure 6, patterns for kurtosis closely paralleled those for skewness; indeed, the two variables were very highly correlated (r=0.911). For L1 speakers, kurtosis significantly decreased with FVR for all vowels (*p*<.05); for L2 speakers, this pattern was only significant for /u/ (*p*<.05). At low FVRs, /u/ was significantly higher than /i,y/ for both speaker groups (*p*<.05); for high FVRs, /u/ > /i/ for L1 speakers (*p*<.05), while /y/ > /i/ for L2 speakers (*p*<.05). This reflects the fact that L2 /y/ exhibited a non-significant increase in kurtosis as FVR increased, a pattern in the opposite direction of other vowels. Finally, L1 speakers tended to produce PFFE with higher kurtosis for /i/ and /y/ at low FVRs (*p*=.072, .080).

The ICC measure of 0.266 suggests relatively low levels of similarity between measurements taken from the same speaker, indicative of a high level of variability throughout.


Table 5: Mixed-effects linear regression model for kurtosis

Figure 6: Kurtosis regression model effects

#### **3.5 Intensity**


Table 6: Mixed-effects linear regression model for intensity

As shown in Table 6, a significant three-way interaction for *intensity* was present between *vowel*, *FVR*, and *speaker group*. The ICC for the intensity model, 0.592, was higher than for the four spectral moments, with inter-speaker differences accounting for more than half of the variation in the data. This reflects the fact that baseline loudness can markedly differ between speakers due to variation in microphone positioning and speaking style.

As shown in Figure 7, the most pronounced pattern was that intensity increased with FVR, a pattern which was significant for all vowels except L2 /u/ (*p*<.05). At low FVRs, L1 speakers exhibited no significant differences between vowels; L2 speakers produced /u/ with significantly higher intensities than /i/ (*p*<.05). At high FVRs, L1 speakers produced /y/ with higher intensity than /i,u/ (*p*<.05); L2 speakers produced /y/ higher than /u/ (*p*<.05) and tended to produce /i/ with higher intensity than /u/ (*p*=.058). There was also a tendency for L1 speakers to produce /i/ at low FVRs with greater intensity than L2 speakers (*p*=.073).

Figure 7: Intensity regression model effects

#### **4 Discussion**

#### **4.1 Speaker Group Differences**

Of the five parameters examined, categorical effects for speaker group were found only in the category of skewness and only for the vowel /y/, where L1 speakers exhibited significantly higher measures than L2s. Given that skewness is a metric of spectral tilt describing the distribution of frequencies occurring above or below the spectral mean, the higher measures reported for the L1 group indicate an L1 PFFE production for /y/ that is negatively skewed, because it contains a higher proportion of frequencies higher than the mean. Although skewness had not been examined in CF PFFE prior to this study, this finding parallels one reported by Munson et al. (2006) who found negatively skewed [s] to be a marker of American anglophone gay men's speech. The preponderance of higher frequencies in L1 speakers' negatively skewed /y/ may contribute to its increased saliency, rendering the phenomenon more easily detectable in L1 /y/ than in other L1 vowels and in all vowels where PFFE is produced by L2s . Future studies should seek to determine whether skewness plays a significant role in the perception of PFFE.

That no other significant main effects were found for speaker group in the remaining parameters is meaningful because it highlights the nuanced interaction of vowel type and FVR in conditioning PFFE among CF speakers and demonstrates the extent to which L2 PFFE is similar in nearly every perceptible phonetic way to L1 PFFE. It also marks a break with a previous study (Dalola & Bridwell 2020) that reported speaker group differences in the category of COG. A comparison of methodologies reveals that such differences likely stem from the treatment of speaker as a random effect in the current COG analysis in lieu of a simple normalization of COG scores. Future work should be explicitly designed to test these methods against one another to determine which holds not only more phonetic validity, but more sociophonetic validity.

#### **4.2 Other Differences: The case of /y/**

Previous research (Dalola & Bridwell 2020) reported divergent behavior for the treatment of /y/ across speakers groups. This analysis reports categorical speaker group differences for /y/ in terms of skewness (as discussed above), as well as for kurtosis and for intensity at higher FVRs. Given that kurtosis is a metric of the distribution of frequencies at the tails of distribution, the higher measures reported for the L2 group at higher FVRs indicate an L2 PFFE production for /y/ at higher FVRs that is less negatively kurtotic (the kurtosis scores are still negative) because it is characterized by a greater number of frequencies concentrated around the distribution mean than is found in L1 speakers, in different vowels and at lower FVRs. Although both L1 and L2 groups yielded negative kurtosis scores (meaning an overall greater number of frequencies were located towards the tails than towards the mean), the lesser degree of eccentricity in the frequencies of the L2 speakers suggests a higher degree of uniformity in L2 /y/ PFFE production than in L1.

/y/ also proved exceptional in terms of measures of standard deviation, where it was observed that both speaker groups exhibited a decrease from lower to higher FVRs, indicating a greater degree of variation in the frequencies at lower FVRs and a lesser degree of variation in the frequencies among higher FVRs. This finding contrasted with the other target vowels, which instead witnessed an increase in measures of standard deviation from lower to higher FVRs, indicating a greater degree of variation in the frequencies as FVRs increased.

PFFE for /y/ also exhibited a special status in terms of intensity at higher FVRs, with L1 speakers realizing it significantly louder than both /i/ and /u/, and L2 speakers realizing it at the same level of loudness as /i/, both of which were significantly louder than /u/. Taken together, this suggests that the higher overall

intensity of /y/ may be suggestive of a sort of "hyperspeech" (Lindblom 1990), in which CF speakers signal to their interlocutors their awareness of PFFE as a sociophonetic marker of polished and affected French, and, in doing so, overemphasize certain phonetic features at the expense of maximum articulatory effort. Dalola & Bridwell (2020) originally suggested that this behavior might be unique to L2 speakers, given the increased markedness of /y/ for non-native CF speakers and this group's elevated measures of COG at lower FVRs (not corroborated here), but the present analysis indicates that the behavior may instead be characteristic of the vowel /y/ and shared across speaker groups. This theory is supported by the previous work examining L1 and L2 speakers' perceptions of the variable, in which it was found that both L1 and L2 speakers construe PFFE as being associated with features of trustworthiness and formality (Dalola 2016), the second of which has notable social capital for CF speakers in their daily and/or professional life (the context in which they were sampled). Similar sociophonetic behavior has been found in white Southern Americans using hyperarticulated [hw] to index educatedness (Bridwell 2019), a phonetic behavior characterized by increased duration of the fricative portion of the segment.

#### **4.3 Implications**

In light of this study's findings, we are now able to suggest four additional parameters to the definition of "sociophonetic competence" as laid out by Dalola & Bullock (2017) and as revised via the addition of COG measures by Dalola & Bridwell (2019, 2020). Previous work on the sociophonetic variable of PFFE in CF has demonstrated that it is not merely sufficient for L2 speakers to have awareness of a sociophonetic variable in their L2 to use it at similar rates or durations as their L1 counterparts, or even in the same types of pragmatic and phonological contexts. This study has instead identified an additional dimension of L2 mastery, namely that of phonetic quality of use, and defined it using four novel acoustic metrics relevant to the PFFE variable (SD, skewness, kurtosis, intensity – COG was first added in Dalola & Bridwell 2019). Such a mastery at the level of production would also imply a heightened sensitivity to the perception of these sound variations, affording speakers the ability to decode an additional layer of meaning in an L2, although it is likely that the perception of these phonetic differences precedes their target-like production (Flege & Hillenbrand 1984). The previous sociophonetic work that has examined various spectral moments and found them to work alone or in concert with other parameters to index information about the speaker – COG and f0 as markers of masculine voices (Zimman 2017), negatively skewed /s/ (i.e. fronter, higher frequency) as a marker of gay

men's speech (Munson et al. 2006), intensity (loudness) values in PFFE segments as markers of negative affect (Dalola 2016) – suggests that the present findings may not only be indicative of speaker group status, but also constructs of gender, sexuality, and affect. Whereas the acoustic energy of PFFE realizations seems to vary allophonically for L1 French speakers, as predicted purely by phonological context and ease of articulation, it seems to vary sociophonetically and pragmatically for L2 speakers, as conditioned by the desire for speakers to signal their sociophonetic awareness to native listeners at structurally and pragmatically acceptable moments.

#### **5 Conclusion**

Future studies will sample advanced L2 French populations more robustly and subdivide their level of advancedness via quantitative measures, i.e., the Bilingual Language Profile (Birdsong et al. 2012). The high levels of variability in L2 PFFE documented here may be optimized by a more rigorous and nuanced classification of speaker group. Additionally, we propose that the current findings be tested via a series of perceptual studies that investigate the pragmatic values of PFFE with differing and controlled measures of COG, SD, skewness, kurtosis, and intensity in both L1- and L2-French populations. In that way, we will be able to isolate which phonetic components of PFFE contribute most reliably and meaningfully to perceptual differences and which ones represent mere physiological variation. Finally, it is imperative that the phonetic and lab phonological approaches currently popular in this type of research seek to combine with or add methods from third-wave sociolinguistics (Eckert 2012), so that the constructs of gender, sexuality, and affect may be investigated more closely with relation to the PFFE variable.

#### **Acknowledgements**

We would like to thank our reviewers and the attendees at LSRL50 for their detailed and thoughtful feedback. Any remaining errors are our own.

#### **References**

Bates, Douglas, Martin Mächler, Ben Bolker & Steve Walker. 2015. Fitting linear mixed-effects models using lme4. *Journal of Statistical Software* 67. 1–48.


Zimman, Lal. 2017. Gender as stylistic bricolage: Transmasculine voices and the relationship between fundamental frequency and /s/. *Language in Society* 46(3). 339–370.

## **Chapter 11**

## **Does social identity play a role in the L2 acquisition of French intonation? Preliminary data from Canadian French-as-a-second-language classroom learners**

### Hilary Walton<sup>a</sup>

<sup>a</sup>University of Toronto

This exploratory study seeks to examine the role of social identity in the acquisition of French intonation, specifically in the realization of the final pitch accent and overall shape of the pitch contour of non-final accentual phrases, among Canadian L2 learners of French. The primary objectives are to investigate the potential role of a social group-based accommodation effect in the acquisition of non-target-like speech and to identify any unique features in the French intonation contours of French immersion versus core French speakers. To such ends, two groups of six Anglophone learners of French having graduated from either a French immersion or a core French program completed a social identity questionnaire and a delayed sentence repetition task. The questionnaire results suggest that French immersion speakers have greater ingroup identification with their French program than their core French counterparts, particularly as concerns their emotional and psychological attachment to their program and peers. Due to the small sample size, differences in the French intonation contours of these learner groups were not significant and require further investigation. The results of this study expand our understanding of the role of sociological factors in the present instance social identity as a potential difference between L2 learner groups, and it is the first study to suggest a potential interaction between social identity and the production of linguistic features in an L2 context.

Hilary Walton. 2023. Does social identity play a role in the L2 acquisition of French intonation? Preliminary data from Canadian French-as-a-second-language classroom learners. In Barbara E. Bullock, Cinzia Russi & Almeida Jacqueline Toribio (eds.), *A half century of Romance linguistics: Selected proceedings of the 50th Linguistic Symposium on Romance Languages*, 221–247. Berlin: Language Science Press. DOI: 10.5281/zenodo.7525112

#### **1 Introduction**

Variability in second language (L2) learning and ultimate attainment due to individual differences is an area of great interest to linguistic researchers and language instructors. Studies examine the role of learner characteristics (e.g., motivation, language aptitude, learning styles and strategies) and their interaction with contextual circumstances to explore individual engagement with the language learning process (Dörnyei 2005). Among the potential individual differences contributing to inter-learner variability is social identity, defined as the strength of an individual's cognitive and psychological attachment to a particular social ingroup (Tajfel 1978). To date, the effects of social identity on linguistic behaviour have primarily been investigated in social psychology research, which has linked social identity with significant implications for group-level perceptions, attitudes and behaviours (e.g., Perreault & Bourhis 1999, Reay 2010, Spears et al. 1999, Tajfel et al. 1971). The present study explores whether the behavioural effects associated with social identity extend to speech when the social identity in question is derived from a particular language learning context. To this end, the L2 acquisition of French intonation contours by classroom learners having completed different high school French-as-a-second-language (FSL) programs is examined. Social identity was assessed using a questionnaire modified from Leach et al. (2008) that served to evaluate speakers' identification with their particular FSL program and their same-program peers, while an acoustic analysis of the French intonation contours of the learners was performed to determine any program-specific linguistic features. The production results were then analyzed in combination with the results of the social identity questionnaire to explore the interaction between the social identity derived from speakers' FSL program and particular intonation features.

The primary objective of this paper is to introduce social identity as a new individual difference in L2 acquisition using data from first participants (n=12) in a larger study. This small sample size does limit potential findings; however, the data serves to illustrate the novel idea that individuals' membership in a particular language learning education program may result in a specific programbased social identity that may, in turn, be linked to the L2 acquisition of speech.

The present paper is organized as follows. First, I present the construct of social identity and its potential implications for language learning (§2.1). Next, I characterize the language learning contexts of the populations of study, namely French immersion and core French programs, and summarize previous work targeting differences between the French speech of learners in these programs (§2.2). I then outline the methodology (§3) and present the results (§4). Lastly, I evaluate the

hypotheses of study (§5) and conclude by positioning this work in the current L2 learning literature and discussing future avenues of research (§6).

#### **2 Background**

#### **2.1 Social identity as a predictor of linguistic behaviour**

As with other individual differences, social identity has potential implications for individuals' language acquisition and ultimate attainment. Indeed, it may result in the acquisition of particular group-level speech patterns through a process of social group-based accommodation. In the context of the current study, social identity refers to the "part of an individual's self-concept which derives from his [or her] knowledge of his [or her] membership [in] a social group (or groups) together with the value and emotional significance attached to that membership" (Tajfel 1978: 63). A social group consists of two or more people who "identify themselves in the same way and have the same definition of who they are, what attributes they have, and how they relate to and differ from specific outgroups" (Hogg et al. 2004: 251). According to Social Identity Theory (SIT; Tajfel 1978, Tajfel & Turner 1979), a prominent framework in the field of social psychology, individuals self-construct any number of social identities based on common attributes with others who share a social category (e.g., players on a soccer team) which, in turn, results in satisfaction and positive self-esteem for the group and its members. As individuals amplify the positive characteristics of group members and group norms, they reduce uncertainties about themselves and their identities (Hogg 2012). This results in self-enhancement and positive self-esteem and may lead to feelings of superiority over non-members (i.e., positive distinctiveness; Brewer 1991, Leonardelli et al. 2010). When social identities are salient, they can have implications for group-level phenomena, as they create group norms such as behaviours, beliefs, and attitudes (Hogg 2018). Such behaviours and attitudes are often established as a means of distinguishing members of one social group from those of other related social groups (Hogg et al. 2004). Reicher et al. (2010) explain that the prototypical behaviours of a social group are typically based on factors that shape the salience and expression of its particular identity and serve to differentiate a given group from opposing social groups. For example, in a high school with a rivalry between the football and basketball teams, players would form their identity based on the particular sport that they play. Although all players would (theoretically) have similar identities related to their particular school, age and athletic ability, it is the sport that would serve as the

dimension of comparison between these two teams and influence the prototypical behaviours and attitudes of each group. Thus, players of each team would actively converge on attitudes and behaviour that would enhance the identity associated with their sport, while simultaneously distancing themselves from being associated with players of another team. Accordingly, it is expected that social groups based on language learning programs may be associated with specific linguistic traits. Indeed, such features may differentiate the speech of learners belonging to opposing language learning program-based social groups.

Language has long been established as an identity marker in all types of national, local, socioeconomic, educational, and occupational groupings (e.g., Dunbar 2003, Labov 1972, 2001, Mange et al. 2009, Roberts 2013, Sankoff et al. 1997). In social groupings, individuals may modify their speech due to a desire to feel a sense of belonging with the members of their social group and to maximize the distinction between themselves and members of the outgroup (Ehala 2018). According to the Communication Accommodation Theory (Giles 1973, Giles et al. 1991), such modifications occur through a process of accommodation in which speakers adapt their speech to the communication style (e.g., accent, tempo, gestures, nonverbal communication) of their interlocuters either through convergence (i.e., becoming more similar) or divergence (i.e., becoming less similar). It is expected that the speech of individuals sharing a social identity would be subject to social group-based accommodation: a process in which speakers converge on the communication style of their fellow ingroup members and diverge from that of outgroup members. This would create a normative and unique withingroup speech style that would, in turn, reinforce individuals' group membership, promote the distinctiveness of their social identity (Ehala 2018) and strengthen their self-esteem (Hogg et al. 2004).

Cases of linguistic accommodation have been reported across a wide range of speech features, gestures, and body language in L1 (first language) speech (Ehala 2018), including phonetic (e.g., Babel 2012, Nielsen 2011, Zellou et al. 2016) and prosodic structures (Giles et al. 1991, Levitan & Hirschberg 2011). Previous studies of social identity-based accommodation have found that the height of the low vowel /æ/ in Scottish English is significantly correlated with speakers' political party membership (Hall-Lew et al. 2017) and anti-/pro-institutional attitudes of engagement (Lawson 2011), and that vocalic features of Northern Irish English are influenced by speakers' religious identities (McCafferty 1999). Of the few studies of accommodation in classroom contexts, researchers found that all students enrolled in a French kindergarten class converged on similar usage of three nonstandard variants (word-final /ʁ/, the realization of /l/ in third person pronouns, and optional liaisons) of the most socially integrated students (Nardy

et al. 2014). Looking at the speech of an older group of students, Eckert (1989, 2008) found that American high school students converged on their use of standard and vernacular forms (e.g., "walking" versus "walkin", marked pronunciation of word-final /t) with peers in their same social groupings (e.g., "jocks", "burnouts", "nerds"). As of yet, no study has investigated linguistic accommodation or the influence of social groupings on speech features in an L2 context. As such, the current study is the first to explore the role of social group-based identity as a potential factor in L2 speech.

#### **2.2 An overview of Canadian French-as-a-second-language programs**

In the province of Ontario, Canada, there are two primary FSL programs: French immersion and core French. These formal language learning programs are offered in English-majority communities to students of all linguistic backgrounds who rarely use French outside of class time (Genesee 1978). With regard to the structure of these programs, core French students complete mandatory French language education courses between Grades 4 and 9, after which French classes are made optional until the end of Grade 12. In contrast, French immersion programs begin in either Grade 1 or Grade 2 of elementary school and students complete a minimum of 5,000 hours of French instruction in a range of academic subjects (e.g., history, geography, science) by the end of Grade 12 (Canadian Parents for French 2017). Although the French courses differ between programs, students in French immersion and core French housed in the same school typically share the same teachers, who are often a mix of native speakers and advanced L2 learners (Netelenbos et al. 2016). As is perhaps expected, due to students' age of acquisition and linguistic environment, both Anglophone and heritage language students in these programs typically have English as their dominant language (e.g., Birdsong 2014).

It is in these FSL contexts that students form social groups and explore their social identities. Although classroom identities are rarely researched in social psychology, it is widely accepted that school plays a critical role in the formation of a context-specific social identity as students, particularly secondary students (McLeod 2000), create an image for themselves. Perry (2002) explains that schools provide children and young adults with norms, practices, experiences and relationships that help them to form their identities. Furthermore, institutions also provide practical knowledge to individuals about their specific social position or category, which may in turn help them to develop a sense of self and identity (MacKinnon & Heise 2010). According to Jenkins (2014), classroom social

#### Hilary Walton

identities may be particularly salient due to the organized network of recognizable roles within school contexts. When applied to the current study, categories such as "student", "French immersion", and "core French" are established by the particular institutional and educational contexts. Students may, therefore, assign weight to their FSL categorization and internalize their membership in a specific program as a social identity.

#### **2.3 Acquisition of French intonation contours by Anglophone learners**

The intonation contours of French utterance-non-final accentual phrases (APs) in declarative sentences were selected as the target linguistic structure of study because they differ in phonetic realization from the learners' L1 English and have proven to be problematic for Anglophone learners of French (Colantoni et al. 2014, Lepetit 1989, Sunara 2018). As such, these contours are more likely to be characterized by non-target-like production and thus, differences between French immersion and core French learners are more likely to occur. Specifically, this study examines two phonetic parameters: (i) the type of the final pitch accent and (ii) the overall shape of the pitch contour.

According to Jun & Fougeron (2000, 2002), APs are the basic prosodic category in French. They group together phonological words corresponding to syntactic phrases that make up the Intermediate Phrase and the Intonational Phrase (IP). French APs typically consist of 3 or 4 syllables but can contain up to a maximum of 8 syllables (Jun & Fougeron 2002). APs are marked by a maximum of one final pitch accent (i.e., a local intonation feature associated with a particular syllable) that is represented by a tone, either high (H), low (L) or a combination of these (H+L; Jun 2005), followed by an asterisk "\*" to indicate its association with a metrically strong syllable (Sunara 2018). Current models of French prosody (Jun & Fougeron 2002, Welby 2006) characterize French APs by an obligatory rise associated with the final accented syllable (H\*) which may be accompanied by a rise on the initial accented syllable, resulting in either LH\* or LHiLH\* tonal patterns for non-final APs. It is the final rise in fundamental frequency, along with vowel lengthening that serve as the primary characteristics of the French AP (Sunara 2018). In contrast to French, English does not include APs in its prosodic hierarchy (Pierrehumbert & Hirschberg 1990) and instead assigns prominence at the word-level via the lexical stress system (e.g., Hayes 1995) realized as higher pitch and longer vowel duration. English tends towards falling pitch accents that are associated with stressed syllables, resulting in H\*L patterns within prosodic words

(Bullock 2009). These differences between languages result in opposing tendencies: French favors AP final prominence while English tends towards word-initial prominence (Clopper 2002).

More generally, the differences in the intonation patterns of French and English can also be described with reference to the overall shape of the pitch contour. French declarative sentences are characterized by a pitch rise at the end of each non-final AP and a falling pitch on the final syllable of the final AP (e.g., D'Imperio et al. 2012). In contrast, both final and non-final APs in English are marked by a fall in pitch. Most important for the present study is that the end of non-final APs is signaled by a rise in pitch in French but by a fall in pitch in English.

Previous studies have found that, due to crosslinguistic influence (CLI), the English-French differences outlined above result in non-target-like realizations of French intonation among Anglophone learners. Lepetit (1989) examined nonfinal APs in declarative sentences using a reading task and found that Anglophone learners had difficulty realizing target-like final pitch accents as well as producing target-like pitch contours for French utterances containing more than one AP. Moreover, Colantoni et al. (2014) found that Anglophone learners of French had difficulty producing APs of the appropriate phonological size. Given that the L2 learners included in the present study were quite experienced with French, I do not expect all speakers to be influenced by the effects of CLI. Rather, I predict that CLI may influence the intonation patterns of some speakers, who will, as a result, carry over their L1 intonation patterns into their L2 to produce non-target-like realizations of French intonation patterns (e.g., falling pitch accents).

#### **2.4 Current study**

I propose here that the structural differences between French immersion and core French programs, including program length and degree of exposure to fellow L2 learners, promote different levels of ingroup identification for several reasons. First, because French immersion students complete the majority of their academic subjects in French, they are separated from their peers enrolled in the same school during class time throughout their education. Second, French immersion students remain in the same cohort for the entirety of their program, which creates close-knit social groups (Lyster 1987). I do not expect similar groups to exist in core French programs, which typically have much larger cohorts and allow for greater mixing with students outside the program as compared to French immersion programs. Thus, French immersion students are expected to have higher

levels of identification with their FSL program membership than core French students (Hypothesis 1).

In addition to potential differences in social identity, I expect to find distinct features in the L2 speech of French immersion and core French speakers. Anecdotally, the speech of French immersion learners has been referred to as an "immersion inter-language" (Lyster 1987) and is suggested to be unique due to production difficulties caused by L1 English influence (e.g., Genesee 1978, Lyster 1987). As concerns FSL learners' speech in particular, Poljak (2015) found that native speakers of Canadian French were able to identify and distinguish between the spontaneous speech and sentence reading of French immersion and core French speakers in a program identification forced-choice task. These findings indicate that these learner groups have distinct (non-native) accents. Consequently, I hypothesize that there will be significant differences in the French speech of French immersion and core French speakers (Hypothesis 2), but given the lack of previous studies on these groups' L2 intonation, I remain unable to make specific predictions regarding each group's L2 production.

Finally, I seek to investigate the possibility that, if differences in the L2 intonation contours of French immersion and core French speakers exist, they will correspond to the social identity results (Hypothesis 3).

In summary, the present study tests the following three hypotheses. First, French immersion speakers will show greater levels of identification with their FSL program than core French speakers due to the close-knit relationships that they form with their peers in such intensive language programs (Hypothesis 1). Second, there will be measurable differences between French immersion and core French learners' speech, specifically the phonetic realization of non-final APs (Hypothesis 2). Lastly, L2 intonation production differences between French immersion and core French speakers will be associated with the strength of their identification with their FSL program and their same-program peers (Hypothesis 3).

#### **3 Methodology**

The current study examines data from 12 female students at the University of Toronto having completed either a French immersion (*mean age: 19.5 years*) or a core French (*mean age: 20.3 years*) high school program. All participants attended secondary school in the Greater-Toronto-Area. It should be noted that because participants were recruited at the university level, I was not able to select students who had attended the same schools or had shared the same French teachers

(see §6 for potential implications). Due to the small sample size and participant availability, only female participants were included in this analysis to neutralize any effects of gender. All participants reported having English as their dominant language and none reported any significant time spent in a French community. Participants were tested individually in a quiet classroom at the University of Toronto while seated at a computer for the entirety of the study. As a part of a larger research project, all participants completed five experimental tasks: (i) sentence reading; (ii) passage reading; (iii) an interactive map activity; (iv) delayed sentence repetition; and (v) ingroup identity questionnaire. The data analyzed here are taken from the final two tasks.

The questionnaire was a slightly modified version of the ingroup identification questionnaire developed by Leach et al. (2008) and was used to measure participants' ingroup identification with their particular FSL program ingroup (for a full list of questions, see Appendix A). This questionnaire, which consists of 14 Likert scale questions assessed on a scale from 1 (*Strongly disagree*) to 7 (*Strongly agree*), evaluates social identity across two constructs based on the definition of social identity proposed by the SIT. The first construct, group-level selfinvestment, assesses individuals' emotional and psychological connections with their ingroup (FSL program) and its members, while the second construct, grouplevel self-definition, reflects individuals' perceived commonalities with their fellow ingroup members (same-program peers).

The delayed sentence repetition task (e.g., Trofimovich & Baker 2006, 2007) was used to elicit controlled speech containing natural intonation contours. For this task, 20 declarative French sentences containing three or four APs (e.g., [*Aurélie*]AP [*deviendra*]AP [*biologiste*]AP. 'Aurélie will become a biologist.') were created based on a subset of sentences from Michelas & D'Imperio (2012). Only sentences containing three APs were used for the present study (n=10; see Appendix B). All target sentences were recorded by a female native speaker of Canadian French and then, to assure that participants could not imitate the prosody of the recording, they were modified to make pitch and syllable duration parameters uniform. To create the auditory stimuli, the pitch of all target sentences was normalized to the native speaker's mean pitch (180 Hz), and all unstressed CV and CVC syllables were normalized to the native speaker's mean values (185 ms and 300 ms, respectively). In this task, participants were told that they would hear French sentences pronounced in a "robot voice" and that they should then repeat them aloud in their own voice as naturally as possible. Participants heard each sentence twice. The first recording of each sentence was accompanied by its written form to ensure that there were no misperceptions. Then, the orthography disappeared, and the recording was played a second time. Next, after a

pause of 3 seconds, participants heard a beep and *Répétez* 'Repeat' was displayed on the screen, prompting participants to repeat the sentence. Along with the normalized auditory stimuli, the 3 second pause served to prevent individuals from mimicking the auditory stimuli as they heard it and thus required participants to produce sentences using their phonological and phonetic grammars (e.g., Trofimovich & Baker 2007).

The intonation analysis consisted of 240 non-final APs (10 sentences × 2 nonfinal APs × 12 speakers). Of these productions, four were excluded due to difficulties repeating the target AP, resulting in a total of 236 APs for analysis. Using ToBI for French annotation (Delais-Roussarie et al. 2015) and PRAAT (Boersma & Weenink 2021), a TextGrid containing tiers of annotation for 1) pitch contour; 2) orthography; 3) prosodic breaks (i.e., AP and IP boundaries); and 4) AP number within the utterance was created for each sentence. The author manually identified the final pitch accent of each AP as a H or a L tone and classified the overall shape of each pitch contour as either rising, falling or sustained pitch. Speakers' production was classified as target-like if it included a H\* final pitch accent and a rising overall pitch contour. The present analysis did not include any quantitative measurements, but they could be examined in a future study.

#### **4 Results**

In this section, I present the results of the ingroup identification questionnaire and the intonation analyses. As mentioned in the introduction, the data included here is taken from the initial stages of a larger project, and thus, the sample size is small. With only six participants per speaker group, this section focuses on absolute differences observed in the data; inferential statistics are not justified but will be provided as footnotes for interested readers.

#### **4.1 Hypothesis 1: Ingroup identification**

Hypothesis 1 predicted that French immersion students, who spend much more time with their same-program peers in an intensive French learning context, would report higher levels of ingroup identification with their FSL program than core French students, for whom there is greater diversity among their Frenchlearning peers. Figure 1 offers an in-depth view of the overall reported responses by presenting the median Likert scores of each speaker group for each question. Recall that the seven-point Likert scale ranged from "Strongly disagree" (1) to "Strongly agree" (7) with questions 1 to 10 and 11 to 14 targeting the selfinvestment and self-definition constructs, respectively.

Figure 1: French immersion and core French speakers' median Likert responses (1–7) for each questionnaire item (Questions 1 to 10: selfinvestment; Questions 11–14: self-definition)

The median Likert scores of French immersion participants are higher than those of core French participants for the majority (10) of the questions. For questions targeting the self-investment construct (1–10), French immersion participants reported levels of agreement that were greater than or equal to those of core French participants in all cases. This pattern does not hold true for the results of the self-definition construct (Questions 11–14), however. Here, French immersion participants responded with lower levels of agreement as compared to their responses to the self-investment questions, resulting in more comparable medians between groups for this construct. This was also, the only construct in which the core French participants reported higher median scores than those of the French immersion speakers for some (2) of the questions.

When analyzing the results of the self-investment construct in isolation, as shown in Figure 2, a clear distinction between participant groups emerges. French immersion participants reported higher levels of agreement than core French participants with questions targeting group-level self-investment.<sup>1</sup>

The reported Likert responses for the self-definition questions are lower than those of the self-investment questions. As shown in Figure 3, no participant selected "Strongly agree" for any of the questions targeting this construct.

<sup>1</sup>A Mann-Whitney U test determined that between-group differences were significant (*p*<0.001).

Figure 2: French immersion and core French speakers' reported level of agreement (%) for questions targeting positive self-investment in FSL program membership

The figure shows that 25% of core French participants reported "Agree" to such questions as compared to only 13% of French immersion participants.<sup>2</sup> However, 54% of French immersion participants selected "Somewhat agree" for these same questions as compared to 33% of core French participants. A clear pattern does not emerge from this subset of the data.

The results of the ingroup identification questionnaire show that French immersion speakers have higher levels of ingroup identification with their FSL program than their core French counterparts overall. This pattern is particularly clear for the results of the questionnaire's self-investment construct, which measures individuals' psychological attachment to their FSL ingroup and its members. Indeed, for this construct, French immersion participants reported median levels of agreement that were equal to or surpassed those of the core French participants with every questionnaire item. In contrast, the results of the selfdefinition construct, which measures individuals' perceived commonalities with their same-program peers, do not present a clear distinction between the FSL groups.

#### **4.2 Hypothesis 2: French intonation contours**

Based on the work of Poljak (2015), investigating the general accent of FSL learners, Hypothesis 2 proposed that there would be differences in the realization of the intonation contours of French declarative sentences between French immersion and core French speakers. In order to test this hypothesis, I examined the type of final pitch accent and the shape of the overall pitch contour of non-final APs.

Table 1 presents the percentage of non-target-like pitch accents (L\*) that were present in the data and organizes them by the AP's position in the utterance.


Table 1: French immersion and core French final pitch accent errors (%) by position of the AP in a sentence repetition task

As shown in Table 1, both French immersion and core French participants demonstrated great difficulty producing target-like pitch accents. In absolute terms,

<sup>2</sup>A Mann-Whitney U test determined that between-group differences were non-significant (*p*>0.05).

core French speakers were less accurate in their production of phrase-final pitch accents than French immersion speakers in both initial and medial APs with error rates of 47% and 57%, respectively. French immersion speakers produced non-target-like final pitch accents in 43% and 50% of the initial and medial APs, respectively.<sup>3</sup>

I now turn to the analysis of the overall shape of the pitch contours. Three distinct pitch contours were observed: (i) target-like rising; (ii) non-target-like falling; and (iii) non-target-like sustained contours. Figure 4 presents the percentage of each pitch contour that was observed for each FSL speaker group. Both speaker groups produced the target-like pitch contours in less than 50% of instances (core French: 46%; French immersion: 41%). Both groups also showed relatively low proportions of the non-target-like falling contour (French immersion: 10%; core French: 15%) compared to their realizations of the non-target-like sustained contours (core French: 40%; French immersion: 49%).<sup>4</sup>

Figure 4: Percentage (%) of pitch contour types for French immersion and core French speakers in a sentence repetition task

In sum, the results of the final pitch accent and pitch contour analyses show that French immersion speakers produced a small but non-significant (i) greater proportion of target-like final pitch accents than core French learners, (ii) fewer rising pitch contours than core French learners and (iii) a greater proportion of sustained pitch contours.

<sup>3</sup>A Chi-squared test revealed the between-group differences were non-significant (*p*=0.51).

<sup>4</sup>A Chi-squared test revealed the between-group differences were non-significant(*p*=0.26).

#### **4.3 Hypothesis 3: Combined ingroup identification and production results**

Hypothesis 3 predicted that if between-group production differences were observed (Hypothesis 2), then they would correspond to the results of the ingroup identification questionnaire (Hypothesis 1). Because no significant differences in the realization of non-final APs were found between French immersion and core French speakers, either in type of pitch accent or overall shape of intonation contours, this section combines the production of all speakers to investigate the relationship between the production and the questionnaire results. I first present the combined questionnaire and final pitch accent results, followed by the results of the questionnaire combined with the overall pitch contours for all participants.

Figure 5 presents the median Likert responses for the productions of H\* and L\* final pitch accents in non-final APs for all participants.

Figure 5: Median reported self-investment Likert responses (1-7) for H\* and L\* final pitch accents in a sentence repetition task for French immersion and core French participants

Speakers who produced the non-target-like L\* final pitch accents reported a median score of 6 on the ingroup identification questionnaire's self-investment construct, which is higher than the median score of 5.5 reported by speakers who produced target-like H\* final pitch accents. The distributions between speakers who produced H\* and L\* pitch accents also differed, with greater variability in the identification scores of speakers who produced L\* pitch accents.<sup>5</sup>

<sup>5</sup>A Mann-Whitney U test determined that between-group differences were significant (*p*=0.002).

Figure 6 presents the median responses for the self-definition construct, for H\* and L\* final pitch accent productions for all speakers. The median response was 4.5 for all speakers, whether they produced a H\* or a L\* final pitch accent in their non-final APs.<sup>6</sup> Accordingly, learners' responses to the self-definition questions do not seem to interact with the production of final pitch accents.

Figure 6: Median reported self-definition Likert responses (1-7) for H\* and L\* final pitch accents in a sentence repetition task for all participants

In sum, a comparison of the data depicted in Figure 5 and 6 does not provide sufficient evidence to suggest that speakers' responses to either construct of the ingroup identification questionnaire are connected to the L2 realization of pitch accents in French. Median scores between speakers having produced target-like and non-target-like pitch accents are minimal or negligible and do not allow for an extensive analysis.

I now examine the relationship between the questionnaire results and the overall shape of the pitch contours for all speakers. Figure 7 displays the median selfinvestment scores for all participants based on their realization of pitch contours as either rising, falling or sustained, while Figure 8 displays the same for the median self-definition scores.

As can be seen in Figure 7, participants who produced a non-target falling pitch contour reported a median Likert score of 5, while participants who produced

<sup>6</sup>A Mann-Whitney U test determined that between-group differences were non-significant (*p*=0.65).

Figure 7: Median reported self-investment Likert responses (1–7) by overall pitch contour of non-final APs in a sentence repetition task for all participants

a target-like rising contour reported a slightly higher median score of 5.5 and participants who produced sustained contours reported a median score of 6, the highest of the three. Participants who produced falling pitch contours reported a much wider range of responses to the self-identification questionnaire items as compared to participants who produced rising and sustained contours and who reported relatively high self-investment scores. This distribution suggests the presence of outliers, a larger sample size will help to account for individual differences in the data and allow for a more robust analysis (see §5).

Despite slightly lower median scores, these same patterns were present in the self-definition data (Figure 8): participants who produced non-target falling pitch contours had the lowest reported Likert scores and the most variability in reported responses.<sup>7</sup>

<sup>7</sup>Mann-Whitney U tests determined that between-group differences were statistically significant for both the self-investment (*p*<0.001) and the self-definition (*p*=0.01) constructs.

Figure 8: Median reported self-definition Likert responses (1–7) by overall pitch contour of non-final APs in a sentence repetition task for all participants

#### **5 Discussion**

The present study evaluated three hypotheses using data from an ingroup identification questionnaire and a delayed sentence repetition task. Hypothesis 1 predicted that French immersion speakers would show greater levels of ingroup identification than core French speakers due to the close-knit relationships that they form with their same-program peers. As expected, the responses to the ingroup identification questionnaire differed between the learners in each program. French immersion participants reported higher levels of ingroup identification than their core French peers on the majority of the questionnaire items, suggesting that these speakers have a more prominent social identity associated with their FSL membership. This pattern was particularly clear for the results of the self-investment construct of the questionnaire which evaluates individuals' emotional and psychological connection to their ingroup. This is perhaps not surprising, as the relationships with their same-program peers discussed above are more likely to encourage emotional and psychological connections with their program and their peers (self-investment) than their view of shared attributes with their peers (self-definition).

Hypothesis 2 predicted that there would be measurable differences between French immersion and core French speakers' French intonation contours. Although both speaker groups yielded a high percentage of non-target-like realizations, the French immersion speakers were more accurate in their production of final pitch accents, while the core French speakers were more likely to produce a target-like rising pitch contour for non-final APs. However, the differences are not statistically significant. and due to the small sample size, speech differences between the speaker groups cannot be confirmed.

Lastly, Hypothesis 3 predicted that between-group speech differences would correspond to differences in ingroup identification scores between French immersion and core French learners. Because there were no verifiable between-program linguistic differences, the study combined the production of all participants to examine a potential interaction between the production and the questionnaire results. Absolute differences in participants' self-investment scores varied in relation to their production of H\* and L\* final pitch accents and the overall shape of pitch contours. Lower identity scores were reported for speakers who produced non-final APs with a non-target-like falling intonation contour, while higher social identity scores were reported for speakers who produced a target-like H\* final pitch accent. Despite such differences, the number of participants and the variability of the results do not allow the current study to describe a relationship between speech and ingroup identification.

A thoughtful consideration of the study presents other limitations, as well. First, as the participants were tested during their university studies, they had attended various schools and were taught by a variety of teachers. Furthermore, the current study did not include a measure of L2 proficiency, as such it is possible that any observed speech differences may stem from differential exposure to the target language. It is also possible that individual factors (e.g., motivation, language aptitude) or CLI may have influenced individual results. Such factors weaken potential generalizations that ingroup identification may influence the speech of L2 learners. An extension of the present study should include a larger sample size and more homogeneous participant groups. All participants should be enrolled in FSL programs in the same school and be taught by the same teachers at the time of testing to control for factors such as input and cohort that could potentially influence the speech of individual learners. This design would ascertain that those individuals belong to the same specific ingroup and would have a higher probability of capturing any linguistic features that may be unique to the particular cohort. Furthermore, participants should complete a validated measure of L2 proficiency to account for potential differences due to stage of acquisition. Future studies of production differences between these learner groups

could examine a wide array of L2 phonetic and phonological features to increase the probability of identifying between-group speech differences.

#### **6 Conclusions**

While the findings of the present study are limited, it does present a significant contribution. Although previous studies have reported findings that social groups may promote linguistic accommodation (e.g., Eckert 1989, 2008, Hall-Lew et al. 2017, Lawson 2011, McCafferty 1999, Nardy et al. 2014), this is the first study to investigate this question in the field of L2 speech learning. Its results provide insights into the L2 learning experience of French immersion and core French speakers using a validated measure of social identity (Leach et al. 2008). It was found that L2 learners enrolled in different language learning programs reported measurable differences in their levels of identification with their particular FSL program, particularly with respect to their emotional connection with the program and their fellow ingroup members. This study also contributes to the current understanding of the previously suggested phonetico-phonological differences in the speech of FSL learners (Poljak 2015), which remain largely understudied.

This study offers preliminary insights into such topics, but due to its small sample size, further research is required to present a clear picture of the speech patterns of speakers in these programs and any potential link to individuals' identification with their social groups. In addition to evaluating the social identity of different L2 learner groups and the interaction between ingroup identification and L2 linguistic structures, the causality between these factors should be explored to determine whether social identity, through social group-based accommodation, contributes, in part, to production differences in the speech of different learner groups. Furthermore, within-group variability in the development of normative speech patterns of a social group should be investigated to determine whether individuals who identify most strongly with their ingroup are more likely to reflect the distinct linguistic structures of their particular social group.

### **Appendix A List of questions included in social identity questionnaire**

Group-level self-investment


Group-level self-definition


#### **Appendix B List of 3-AP target sentences**


#### **References**


& Martha E. Pollack (eds.), *Intentions in communication*, 271–311. Cambridge, MA: MIT Press.


## **Chapter 12**

## **Sociophonetic analysis of mid front vowel production in Barcelona**

Annie Helms<sup>a</sup>

<sup>a</sup>University of California, Berkeley

Studies aimed at observing Spanish contact-induced changes in Catalan among Spanish-Catalan bilinguals in Catalonia have evidenced both assimilation and dissimilation in the production of Catalan mid front vowels. However, the general lack of studies aimed at observing Catalan contact-induced changes in Spanish is an oversight of bidirectionality as an expectation of language contact. Accordingly, the present study uses both Catalan and Spanish mid front vowel production data from Barcelona to investigate the roles of age, gender, and language dominance in the processes of assimilation (Catalan cognate effects in Spanish) and dissimilation (distinctly produced cross-linguistic mid front vowel categories). While no cognate effect was observed in Spanish, younger speakers maintain less distinction between Catalan mid front vowel categories than older speakers, and females have less overlap between Spanish /e/ and Catalan /E/ than males. These results are consistent with a male-led change towards greater assimilation and more overlapping productions of all three mid front vowel categories. While cognitive factors such as language dominance and cognate status are central to models of bilingual phonetic representation, it is paramount to situate the bilingual individual within the context of the community and acknowledge the external social factors which also mediate variation in acquisition and production.

### **1 Introduction**

In Barcelona, Spanish and Catalan have been in close contact for centuries and many instances of lexical and phonological imposition and borrowing have been recorded (Galindo i Solé 2003, 2006, Arnal 2011). However, the nearly exclusive focus on the variable acquisition of Catalan in Spain has furthered a longstanding asymmetry favoring the study of the Spanish-influence in Catalan over

#### Annie Helms

Catalan-influence in Spanish (Galindo i Solé 2003: 18). This trend in the literature, exemplified by Arnal (2011: 22) who states that "in the current situation of generalized bilingualism in Catalonia, the change caused by contact does not affect Spanish, but rather only affects Catalan", runs counter to the expectation of bidirectionality in language contact situations with widespread bilingualism (Thomason & Kaufman 1988, Davidson 2020). Furthermore, the lack of cross-linguistic production data in the literature limits the conclusions that can be drawn about outcomes of language contact, especially regarding the processes of assimilation and dissimilation theorized by the Speech Learning Model (SLM; Flege 1995) and the revised Speech Learning Model (SLM-r; Flege & Bohn 2021).

A variable studied among Spanish-Catalan bilinguals in which this asymmetry is often present is the production of mid vowels in Catalan. The mid front and mid back vowels of Catalan are contrasting, yielding minimal pairs (e.g. /net/ 'grandson' and /nEt/ 'clean'; /os/ 'bear' and /Os/ 'bone'). Regarding the mid front vowels, with which this study is concerned, Spanish /e/ is produced lower (higher F1) than the Catalan /e/, but higher (lower F1) and more fronted (higher F2) than the Catalan /E/ (Figure 1). However, the most significant sources of variability

Figure 1: Vowel Spaces of Spanish (Ladefoged & Johnson 2015: 237) and Catalan (Carbonell & Llisterri 1999: 62)

among the three vowel categories are found across F1, not F2 (Bosch & Ramon-Casas 2011, Cortés et al. 2019, Recasens & Espinosa 2006, Simonet 2011). Certain factors, such as cognate status, language dominance, and age have been linked to the variable production of these vowels cross-linguistically. Therefore, the production of these mid front vowels in bilingual settings have been the focus of many studies as they provide an opportunity to study outcomes of language contact, phonological representations of bilinguals, and the social factors that mediate these processes.

#### **1.1 Models of phonological acquisition and representation**

The Speech Learning Model (SLM; Flege 1995) and the Revised Speech Learning Model (SLM-r; Flege & Bohn 2021) postulate that bilingual speakers do not maintain separate phonetic systems for each language, rather, the two systems co-exist in a mutual phonetic space and may influence one another. Whereas the SLM focuses on between-group differences and whether L2 speakers are able to produce target L1 categories, the SLM-r shifts focus to the individual and an individual's differentiation of L1 and L2 categories based on quantity and quality of input of L1 and L2.

#### **1.1.1 Assimilatory outcomes**

Both the SLM and SLM-r postulate that if phonetic differences between an L2 category and an L1 category are not perceived by a bilingual individual, the formation of a new category will be blocked. Blockage of category formation results in assimilation, by which the speaker may produce a composite L1-L2 category (Evans & Iverson 2007, Kendall & Fridland 2012). The acoustic properties of this composite category are "defined by the statistical regularities present in the combined distributions of the perceptually linked L1 and L2 sounds" (Flege & Bohn 2021: 41). A number of factors, such as the quality and quantity of L1 and L2 input the bilingual receives in their lifetime (Flege 2002, Yeni-Komshian et al. 2000), individual cognitive differences (Lev-Ari & Peperkamp 2013), or relative language activation (Grosjean 2001), will affect the overall acoustic profile of this composite sound. Additionally, assimilation may occur between two L2 categories when sufficient acoustic differences are not perceived or produced within the contrast.

#### **1.1.2 Dissimilatory outcomes**

Alternatively, the SLM and SLM-r describe the process of dissimilation, which may occur in the inventory of a bilingual that is able to perceive sufficient acoustic difference between the L1 and L2 categories, or between two L2 categories, thus preventing assimilation from occurring. In this case, the individual maintains distinct categories in each language. Categories may "deflect" one another to maintain contrast in the shared phonetic space (Baker & Trofimovich 2005, Flege & Bohn 2021), yielding categories that differ from those of a monolingual speaker. However, evidence of cross-linguistic dissimilation is less abundant in the literature as many studies report perception or production data from only one of the languages in question.

#### **1.2 Cognate effects**

Through a cognate effect, the phonology of cognate words in the non-target language may be activated during speech, potentially affecting the production of a word in the target language (Costa et al. 2000, Colomé & Miozzo 2010). Assimilation may be realized through cognate effects, where the influence of the L1 on L2 category production is strengthened in words that are cognate between the L1 and L2. Amengual (2016a) provides evidence for a cognate effect in the production of Catalan mid back vowels in Mallorca, where productions of Catalan /O/ with incongruent Spanish cognates are raised, evidencing assimilation towards Spanish /o/.

#### **1.3 Language dominance**

Language dominance is a measure of linguistic history, linguistic attitudes, language proficiency, and language use, and is often a predictor of category production and perception of bilinguals (Birdsong et al. 2012). Under the SLM and SLMr (Flege 1995, Flege & Bohn 2021), language dominance, through language exposure and experience, can predict whether assimilation or dissimilation may occur. Amengual (2016b) provides evidence for language dominance as a predictor of assimilation, where Spanish-dominant bilinguals assimilated both Catalan /e/ and Catalan /E/ to Spanish /e/. Alternatively, Bosch & Ramon-Casas (2011) find that systematic and consistent exposure to Catalan (i.e., greater Catalan-dominance) contributes to the production of distinct Catalan mid front vowel categories by bilinguals raised in Spanish-dominant homes, thus providing evidence for dissimilation between two L2 categories. No cross-linguistic comparisons were performed in the study, thus it is unknown whether or not language dominance additionally contributed to cross-linguistic dissimilation between Catalan /e/ and Spanish /e/ among these bilinguals.

#### **1.4 Age and gender**

Within the variationist sociolinguistic framework, social factors such as age and gender are correlated with sound changes in progress. Under the apparent-time construct (Bailey 2004), generational sound change can be observed by comparing the speech of older speakers from that of younger speakers, where the speakers pertain to separate generations (Labov 1994: 45–46). Patterns of language use across gender often are consistent with the Gender Paradox, where "women conform more closely than men to sociolinguistic norms that are overtly prescribed, but conform less than men when they are not" (Labov 2001: 292–293).

Therefore, in both changes from above and below the level of conscious awareness, women tend to be the leaders of change. The combination of this principle with the apparent-time construct results in the methodological practice of treating younger women's speech patterns as suggestive of possible community-wide changes in progress (Labov 2001: 279).

In Barcelona, age is additionally a correlate to access to explicit language instruction as the Catalonian Linguistic Normalization Law of 1983 has yielded a generational divide between those that have and have not had access to Catalan instruction in school. Although most studies of Spanish and Catalan production in Catalonia focus on the effects of language dominance and other cognitive factors, Cortés et al. (2019) examine the language of preschool-aged children and their parents in three neighborhoods of Barcelona. They observe that the children's ability to produce Catalan mid vowels is most affected by the language environment (i.e., strength of Spanish-influence in neighborhood), whereas the production of adults is more affected by personal relationships and connections maintained in the present and the past. Despite the relative lack of research on the role of age and gender in vowel production in this bilingual community, but due to increasing immigration and the documented mid front vowel merger in progress in some areas in Barcelona (Mora & Nadeu 2012), I predict that a potential change in progress, if observed, would be led by younger female speakers and be advancing in the direction of vowel assimilation.

#### **1.5 The present study**

The present study uses cross-linguistic production data to address the following research questions relating social factors to processes of assimilation and dissimilation. First, how do the factors of gender, age, and language dominance mediate a cognate effect from Catalan in the production of Spanish /e/, demonstrating assimilation? I hypothesize that a cognate effect will occur less among less Catalandominant speakers, less in females than males, and less in younger speakers – despite exposure to new Catalan educational policies – as these speakers are less likely to have maintained the Catalan mid front vowel contrast, thereby inhibiting a cognate effect. Secondly, how do these social factors mediate the degree to which Spanish /e/ overlaps with each Catalan mid front vowel? I hypothesize that decreased Catalan-dominance will contribute to greater overlap (assimilation), and that female speakers and younger speakers will additionally produce categories with less acoustic distinction.

### **2 Methodology**

#### **2.1 Subject population**

Seventeen participants were recruited with flyers posted at the University of Barcelona and were stratified according to age and gender. Two generations are represented, one group between 18–25 years old and the other between 40–65 years old. All participants are bilingual in Spanish and Catalan and have lived in Barcelona for the past 10 years. All participants were connected to the University, or had been connected in the past, which may yield a similar exposure to Catalan in a professional setting across participants.


Table 1: Number of participants and mean language dominance scores (with standard deviations) across the four social cells

Each participant was assigned a Catalan-Spanish dominance score (minimum: −218; maximum: +218) after completing the Bilingual Language Profile (Birdsong et al. 2012), where a more positive dominance score is correlated with greater dominance in Catalan and a more negative dominance score is correlated with greater dominance in Spanish. Table 1 shows the distribution of participants across the four participant groups, as well as the mean language dominance score for each group. The relatively low number of speakers that are more Spanishdominant (n = 4) and the imbalance of their distribution across the four social cells prevent the use of a categorical dominance score without crossing language dominance with other social factors. Instead, dominance will remain a continuous factor in the present analysis and will not interact with either gender or age. Although the sample size is relatively small, 3–5 participants per cell is the statistical minimum to reflect group tendencies more than individual idiosyncrasies (Tagliamonte 2006: 31). At the time of data collection, no participant reported ever having any history of speech or hearing disorders.

#### **2.2 Materials and procedure**

All experimental sessions took place in an empty classroom at the University of Barcelona. Participants were asked if they would prefer to interact with the

researcher in Spanish or in Catalan. All communication, including the consent form, questionnaire, and sociolinguistic interview, were subsequently conducted in the preferred language. First, the participants were instructed to read and sign the consent form and complete the Bilingual Language Profile (Birdsong et al. 2012), adapted as a Qualtrics survey (Qualtrics 2005). Next, the participants engaged in a sociolinguistic interview (data not analyzed in the present study), followed by two elicited production tasks, the first in Spanish and the second in Catalan. The productions of these token stimuli were recorded using a Zoom H4N Multitrack Recorder and Comica Lavalier microphone.

The Spanish word list<sup>1</sup> used in the elicited production task was stratified according to cognate status: 20 words have congruent Catalan cognates (e.g., Sp. conc[e]pto, Cat. conc[e]pte 'concept'); 20 words have incongruent Catalan cognates (e.g., Sp. inter[e]s, Cat. inter[E]s 'interest'); and 20 words have no Catalan cognate (e.g., Sp. mad[e]ra, Cat. fusta 'wood'). In order to determine the target vowel for each Catalan cognate, an online dictionary with transcriptions of the Barcelona variety of Catalan was consulted (Alcover & Moll 2002). The Catalan word list consisted only of the 40 congruent and incongruent Catalan cognates. According to the online corpus NIM (Guasch et al. 2013), all words from the Spanish word list have a relative frequency of at least 10 parts per million (ppm), and all words from the Catalan word list have a relative frequency of at least 5 ppm. In each word list, all target vowels occur in stressed syllables. Additionally, Spanish words where /e/ is followed by a palatal consonant, or either an /x/ or an /r/, were excluded, as these segments either lower or raise the F1 of /e/ (Hualde 2013: 115). Before data collection began in Barcelona, four trained linguists who are native speakers of Catalan and/or Spanish participated in a pilot study. After the experiment, none of the participants were able to identify the sound of interest, so to reduce the duration of the experiment, neither word list included filler tokens. Each word list was randomized and all participants saw the same list orders appear on a tablet in the form of isolated words.

#### **2.3 Acoustic analysis**

A total of 1,020 Spanish mid front vowels (17 participants x 20 words x 3 cognate levels) and a total of 680 Catalan mid front vowels (17 participants x 20 words x 2 target vowels) were submitted to acoustic analysis. For the Spanish data, timealigned, word- and phoneme-segmented Praat TextGrid files were generated using Montreal Forced Aligner (McAuliffe et al. 2017) with a Spanish dictionary (Morgan 2017). The TextGrids were hand-corrected in Praat (Boersma & Weenink

<sup>1</sup> Spanish and Catalan word lists available at https://anniehelms.github.io/lsrl50\_supplemental/

2019), and a Praat script (Riebold 2013) was used to extract measurements for F1, F2, and F3 at the midpoint of each stressed /e/ phone marked in the TextGrid in order to minimize co-articulation effects upon the formant measurements. The same procedure was carried out for the 680 Catalan mid front vowels, and vowel categories were classified following the target vowels in the word list. The F1 and F2 measurements for all mid front vowels of Spanish and Catalan were normalized across vocal tract length, using the Lammert and Narayanan ΔF normalization method (Johnson 2020), which can be calculated using only a subset of vowels from the acoustic space.

#### **2.4 Statistical analysis**

To analyze a possible mid front vowel merger in Catalan, a Pillai score was calculated for the two mid front Catalan vowels for each speaker in the data set and measures of F1 were submitted to a mixed effects linear regression model. Although normally both Pillai scores and measures of Euclidean distance are employed to analyze possible mergers, the two Catalan mid front vowels predominantly differ across F1, so the regression model of F1 measures provides roughly the same information as Euclidean distance. The Pillai score is a measure of the degree of overlap between vowel categories and is calculated for each speaker from multivariate analysis of variance (manova) models fitted with F1 and F2 measurements by vowel category (Nycz & Hall-Lew 2013). The Pillai scores from each speaker were calculated using a custom function and submitted to a fixed effects linear regression model using the glm() function in R (R Core Team 2018). The model includes a main effect of language dominance and a two-way interaction term of age and gender. The mixed effects linear regression model predicting F1 was built using the lmerTest package (Kuznetsova et al. 2017). This model serves as another indicator of a possible mid front vowel merger, and additionally provides information about variation occurring across this formant axis. The model contains a two-way interaction of language dominance and target catalan vowel, a three-way interaction of gender, age, and target catalan vowel, and random intercepts of participant and token word.

In order to observe a possible cognate effect within productions of Spanish /e/, F1 measures were submitted to a mixed effects linear regression model. The model included a two-way interaction of language dominance and catalan cognate vowel, a three-way interaction between gender, age, and catalan cognate vowel, and random intercepts of token word and participant. Additionally, to observe the impact of social factors on the degree of overlap between Spanish /e/ and Catalan /e/, and between Spanish /e/ and Catalan /E/, Pillai

scores were calculated for each participant for each vowel category comparison. Scores were submitted to separate fixed effects linear regression models with the two-way interaction between age and gender and the main effect of language dominance. For these regression models, and all previous models, the emmeans package (Lenth 2021) was used to calculate Cohen's *d* effect sizes for pairwise comparisons and to perform necessary post-hoc tests using a Tukey pairwise comparison. The heplots package (Fox et al. 2021) was used to calculate partial eta-squared ( 2 ) effect sizes for fixed effects models, and the r2glmm package (Jaeger 2017) was used to calculate marginal R-squared ( 2 ) effect sizes for mixed effects models.

#### **3 Results**

#### **3.1 Catalan production**

Table 2: Regression coefficients for fixed effects linear model predicting overlap between Catalan /e/ and Catalan /E/ (Pillai scores) across the two-way interaction of age and gender and the main effect of language dominance (Dom.). The intercept is the overlap of older female speakers with a language dominance score of 0.


To look for evidence of assimilation via a cognate effect, it must first be determined if /e/ and /E/ are produced as distinct vowel categories. An initial visual examination of the Catalan mid front vowels (Figure 2) suggests that older females produce more contrasting vowels than other participants. To investigate this observation further, Pillai scores and productions along F1 were analyzed. The coefficients of the regression model of individuals' Pillai scores (Table 2) indicate that neither age, gender, or language dominance significantly impact the degree of overlap between the two mid front vowel categories. However, the main effect of age is approaching significance, where younger speakers produce

Figure 2: Vowel space plot showing distribution of three vowel categories in acoustic space, and displayed across social factors of age and gender. Ellipses are drawn 1 SD from the mean. Formant values appear in normalized units, derived originally from raw hertz values.

Catalan mid front vowels with greater overlap than older speakers ( = −0.17, 2 = 0.565, *p* = 0.079).

Regression coefficients for the mixed effect linear regression model predicting Catalan F1 (Table 3) indicate significant main effects of language dominance, target catalan vowel, gender, and a significant interaction of target catalan vowel and age. A post-hoc Tukey pairwise comparison of target catalan vowel and age reveals that /e/ is produced significantly higher than /E/ for older speakers ( = 0.0771, *d* = 1.474, *p* < 0.001) and for younger speakers ( = 0.0433, *d* = 0.828, *p* < 0.05). Additionally, while the acoustic distinction between the /E/ of older speakers and the /E/ of younger speakers is not significant, older speakers produce /e/ higher than younger speakers ( = 0.0885, *d* = 1.691, *p* < 0.01). Main effects of language dominance ( = 0.00059, <sup>2</sup> = 0.047, *p* < 0.05) and gender ( = −0.095, <sup>2</sup> = 0.068, *p* < 0.01) indicate that mid front vowels are produced lower with increasing Catalan-dominance, and that male productions are higher

than female productions. Although these social factors influence Catalan production, they do not impact the degree of acoustic distinction between the two mid front vowel categories.

Table 3: Regression coefficients for mixed effects linear model predicting Catalan F1, with a three-way interaction of age, gender, and target vowel and a two-way interaction of language dominance (Dom.) and target vowel. The intercept is older, female speakers producing Catalan /E/ with a language dominance score of 0.


#### **3.2 Spanish production**

From the Spanish word lists, F1 measures of Spanish /e/ were submitted to a mixed effects linear regression model and the output of the model appears in Table 4. The model output indicates a significant main effect of language dominance ( = 0.00052, <sup>2</sup> = 0.040, *p* < 0.01), where speakers that are more Catalandominant produce Spanish /e/ lower than speakers that are less Catalan-dominant. The model also reveals a significant main effect of age ( = 0.049, 2 = 0.033, *p* < 0.05), where younger speakers have lower productions than older speakers. Additionally, the main effect of gender is approaching significance, where males produce /e/ higher than females ( = 0.047, <sup>2</sup> = 0.020, *p* = 0.0524). Importantly, there is no significant interaction containing levels of the factor catalan cognate vowel, which could indicate a cognate effect from Catalan in the production of Spanish /e/. Accordingly, it seems that these participants do not evidence assimilation via a cognate effect to Catalan mid front vowel categories in their production of Spanish /e/.

Table 4: Regression coefficients for mixed effects linear model predicting Spanish F1 with a three-way interaction of age, gender, and catalan cognate vowel (NC = non-cognate) and a two-way interaction of language dominance (Dom.) and catalan cognate vowel. The intercept is older, female speakers with a language dominance score of 0 producing Spanish /e/ where the Catalan cognate vowel is /E/.


#### **3.3 Cross-linguistic production**

To observe how social factors mediate the degree of overlap between Spanish /e/ and each Catalan mid front vowel, Pillai scores measuring the degree of overlap between Spanish /e/ and each Catalan mid front vowel were calculated for each participant. The values were submitted to fixed effects linear regression models with main effects of language dominance and two-way interactions of age and gender. The regression coefficients for the overlap between Spanish /e/ and Catalan /e/ appear in Table 5. The model output indicates that social factors do not impact the degree of overlap of these two categories. In other words, all participants regardless of age, gender, or language dominance, demonstrate a similar degree of overlap of Spanish /e/ and Catalan /e/. As a relative measure, Pillai scores do not convey a specific percentage of overlap in productions. However, as scores can range from 0 (merged) to 1 (unmerged), the fairly low -coefficient of the intercept (0.156) suggests that the categories have a considerable degree of overlap.

#### 12 Sociophonetic analysis of mid front vowel production in Barcelona

Table 5: Regression coefficients for the fixed linear effects model predicting the degree of overlap between Spanish /e/ and Catalan /e/ (Pillai scores) across the two-way interaction of age and gender and the main effect of language dominance (Dom.). The intercept is the overlap of productions by older, female speakers with a language dominance score of 0.


Table 6: Regression coefficients for the fixed linear effects model predicting the degree of overlap between Spanish /e/ and Catalan /E/ (Pillai scores) across the two-way interaction of age and gender and the main effect of language dominance (Dom.). The intercept is the overlap of productions by older, female speakers with a language dominance of 0.


The regression coefficients for the model predicting the overlap of Spanish /e/ and Catalan /E/ are shown in Table 6. In this model output, the main effect of gender is significant ( = 0.238, <sup>2</sup> = 0.99, *p* < 0.05), indicating that male speakers produce the two categories with more overlap, relative to female speakers. Additionally, the factors of age and language dominance are approaching significance, where younger speakers would evidence more overlap than older speakers ( = 0.122, <sup>2</sup> = 0.99, *p* = 0.0893) and more Catalan-dominant speakers would evidence less overlap ( = 0.0013, <sup>2</sup> = 0.99, *p* = 0.0664). The -coefficient of the intercept, compared with that of the previously analyzed model, suggests that these participants have greater overlap of Spanish /e/ and Catalan /e/, than of Spanish /e/ and Catalan /E/, an observation that is supported visually in Figure 2.

#### **4 Discussion**

In order to address the first research question, whether there is a cognate effect from Catalan in the production of Spanish /e/, the Catalan data were first analyzed. The results of this analysis indicate that the categories of Catalan /e/ and Catalan /E/ have not yet fully merged for these participants, as each category was produced significantly differently across F1 by both older speakers and younger speakers. However, the results of the regression model of individual Pillai scores indicate that the degree of category overlap may be increasing in apparent-time. Since significant acoustic distinction was observed, it could be possible to find cognate effects in the Spanish production data. However, the factor of catalan cognate vowel was not a significant main effect of F1 of Spanish /e/, nor was it involved in any significant interactions, suggesting that there is no observable cognate effect, regardless of the social profile of the participants analyzed in this study.

The second research question regards the variation due to social factors in the degree of overlap between Spanish /e/ and Catalan /e/, and also between Spanish /e/ and Catalan /E/. While the overlap between Spanish /e/ and Catalan /e/ was not seen to be socially-mediated, gender was a significant predictor of the degree of overlap between Spanish /e/ and Catalan /E/, where the male participants tend to evidence more overlap between the two categories than the female participants. Contrary to the research hypothesis, neither age nor language dominance significantly impacted the degree of overlap, though both factors approached statistical significance.

Regarding the role of age, older speakers maintained less overlap of Catalan mid front categories and greater acoustic distinction across F1, thus demonstrating adherence to prescriptive norms, where Catalan /e/ and Catalan /E/ are two distinct phones. This result is in line with the research hypothesis, despite the educational changes that occurred in Catalonia following the Catalonian Linguistic Normalization law of 1983. Though exposure to Catalan instruction in school may be greater for younger speakers, the large number of immigrants that are L2 Catalan speakers attending public schools alongside native Catalan speakers may contribute to increased exposure to exemplars with a merger. Therefore, exposure to Catalan under the new educational policies does not deterministically cause younger speakers to fully adhere to prescriptive norms. These findings therefore support the SLM-r's assertion that "[q]uality of input has been largely ignored in L2 speech research [in favor of quantity] even though it may well determine the extent to which L2 learners differ from native speakers" (Flege & Bohn 2021: 32).

#### 12 Sociophonetic analysis of mid front vowel production in Barcelona

With regard to the role of gender, the female participants had less overlap between Spanish /e/ and Catalan /E/ than the male participants, thus conforming more to the prescriptive norm than males. That gender did not also impact the overlap of Spanish /e/ and Catalan /e/ could be attributed to the lack of salience of this distinction; one participant of this study mentioned during the sociolinguistic interview that students are taught in school that Spanish /e/ and Catalan /e/ are the same sound. Though the factor of age was only marginally significant, a visual examination of Figure 2 suggests that the younger speakers tend to conflate all three vowel categories whereas the older speakers mainly conflate Spanish /e/ and Catalan /e/. With greater statistical power, perhaps age would surface as a significant predictor of overlap, in which case the data would be consistent with a male-led change in progress. Alternatively, the lack of a significant age effect, coupled with female speakers' greater adherence to the prescriptive norm than male speakers, is consistent with a case of stable variation. However, since prior studies (e.g., Mora & Nadeu 2012) have documented the merger as recent and ongoing, I will proceed considering the merger as a possible change in progress.

Whereas females are often the leaders of community-wide change (Labov 2001), a male-led change could suggest that the social meaning indexed by a production of Spanish /e/ that is conflated with Catalan mid front vowels is stratified by gender. For example, it could be that this variant is an index of solidarity or a Catalan-identity marker (akin to lateral velarization and other phonetic phenomena; Davidson 2019) that is generally associated with males and does not provide social gain if used by females (Chappell 2016). Similarly, because bilinguals that use this variant would only have one mid front vowel category crosslinguistically, the variant could also index some trait associated with metropolitan bilingualism or a unique Barcelona identity that is a blend of Spanish and Catalan identities (Newman & Trenchs-Parera 2015). Of course, these presently speculative accounts can be empirically tested in future perception research that aims to uncover the social meanings that listeners afford to the assimilation of Spanish /e/ and Catalan /E/.

Models of second language acquisition and representation, such as the SLM, SLM-r, PAM-L2, and L2LP, predict language dominance to be an important factor in category formation. Although language dominance did not surface as a significant predictor of production in this study, the participant group was considerably homogeneous in terms of dominance, where only 4 participants were scored as Spanish-dominant by the BLP. A larger data set with more variability in language dominance could be more revealing of the importance of input in language production. Of the three models, the SLM-r should make the most

#### Annie Helms

relevant predictions for the data as the participants are early bilinguals rather than naïve listeners or adult learners and the bias towards Catalan-dominance in the data set means that the majority of participants are attempting to produce a single L2 category (Spanish /e/) rather than an L2 category contrast (Catalan /e/ and /E/). Under this theory, the quality and quantity of input that participants receive of Catalan /e/ does not allow for sufficient differentiation from Spanish /e/, yielding category assimilation in their production as was seen in the Pillai scores.

Based on SLM-r, any differences in production based on gender, ethnicity, age, or other social factors must necessarily be derived from differences in L2 input. However, the connection between variants and social meaning central to variationist sociolinguistic theory is not accounted for in the SLM-r. Chappell & Kanwit (2021) attempt to reconcile this disconnect by proposing a unified framework of theoretical models of second language learning, exemplar theory, and indexical meaning to explain variable outcomes in second language perception. The present data also support this unified approach for production, where variable productions may be influenced by the mapping of social meaning onto said variants, in addition to the L2 input received. While cognitive factors such as language dominance and cognate status are central to models of bilingual phonetic representation, it is paramount to situate the bilingual individual within the context of the community and acknowledge the external social factors which also mediate variation in acquisition and production.

#### **5 Conclusions**

This study found evidence for age and gender as predictors of assimilation among productions of Spanish and Catalan mid front vowels. Younger speakers evidence less acoustic distinction between Catalan mid front vowels than older speakers, and males produce Spanish /e/ and Catalan /E/ with more overlap than females. These findings contribute a variationist sociolinguistic analysis to the literature of bilingual production of mid front vowels, demonstrating the importance of viewing models of category acquisition and phonetic representation through the lens of social factors, in conjunction with cognitive factors. Future studies of contact varieties of Spanish found in bilingual settings within Barcelona, both in the realms of production and perception, will further reveal the impact of contemporary Catalan linguistic policies, and the social meaning indexed by variation within the production of these (vocalic) and other linguistic variables.

### **Acknowledgements**

I would like to thank Justin Davidson, Ernesto Gutiérrez Topete, Ana Belén Redondo Campillos, Bernat Bardagil i Más, Yamel Nuñez, Antonio Torres Torres, the audience members at LSRL50, and two anonymous reviewers for their contributions and insightful feedback. All remaining errors are my own.

### **References**


#### Annie Helms


#### Annie Helms


## **Chapter 13**

## **Prosodic correlates of mirative and new information focus in Spanish wh-in-situ questions**

### Carolina González<sup>a</sup> & Lara Reglero<sup>a</sup>

a Florida State University

> This paper examines the prosodic correlates of focus in two types of wh-in-situ questions in Spanish: information-seeking (INF), and echo-surprise (SUR). We hypothesize that they will have different intonational properties since the former are associated with new-information focus, while the latter are compatible with mirative focus since they express unexpectedness and surprise (Badan & Crocco 2019). A total of 280 sentences from a contextualized elicitation task were analyzed in Praat following SpToBI conventions. Results show that INF and SUR have similar melodic contours, involving a rise through the first pre-nuclear accent, declination, and a steep final rise on the wh-phrase. However, SUR questions have a higher nuclear peak and larger focal tonal range than INF questions. Our results show clear scaling differences in the nuclear configuration consistent with a difference between new-information and mirative focus, which can be phonologically analyzed as nuclear upstep in SUR (L+¡H\*), unlike in INF (L+H\*).

### **1 Introduction**

This study compares the prosodic correlates of focus in two types of Spanish whin-situ questions: Information-seeking (INF), and echo-surprise (SUR). While the main strategy to formulate a wh-question in Spanish involves wh-fronting (1a), wh-in-situ questions are also possible in some dialects, such as in North-Central Peninsular Spanish (1b) (Jiménez 1997, Uribe-Etxebarria 2002, Etxepare & Uribe-Etxebarria 2005, Reglero 2007, Reglero & Ticio 2013, among others).

Carolina González & Lara Reglero. 2023. Prosodic correlates of mirative and new information focus in Spanish wh-in-situ questions. In Barbara E. Bullock, Cinzia Russi & Almeida Jacqueline Toribio (eds.), *A half century of Romance linguistics: Selected proceedings of the 50th Linguistic Symposium on Romance Languages*, 269–297. Berlin: Language Science Press. DOI: 10 . 5281 / zenodo.7525116

	- b. ¿Rosalía llevó **qué**?

The pragmatic meanings of wh-in-situ questions in Spanish are varied. A sentence such as (1b) can be interpreted as an information-seeking (INF) question eliciting information in a neutral way (Reglero 2007, Reglero & Ticio 2013). Alternatively, (1b) can be interpreted as an echo question, i.e., a question requesting repetition of information (echo-repetition, henceforth REP) or conveying surprise (echo-surprise, henceforth SUR) (Chernova 2013, 2017, Reglero & Ticio 2013). Regardless of the pragmatic reading, the in-situ wh-element carries the main focus of the question (Horvath 1986, Rochemont 1986, Tuller 1992, Zubizarreta 1998, Escandell Vidal 1999).

In this study, we follow Reglero (2007) and Reglero & Ticio (2013) in considering INF questions as having new information focus; and we argue, based on Badan & Crocco (2019), that SUR questions in Spanish have mirative focus, which conveys counter-expectational value. Spanish INF and SUR questions display some syntactic differences, including differences in word order. In addition, impressionistic reports and a previous small-scale study suggest some intonational differences as well (González & Reglero 2018). In the present study, we investigate the prosodic characteristics of INF and SUR within a larger set of speakers, and connect these differences to focus, taking into consideration relevant studies from other Romance languages.

Our study is framed within the Auto-Segmental (AM) model of intonation (Pierrehumbert 1980, Pierrehumbert & Beckman 1988, Ladd 2008), which views intonation as the anchoring of High (H) and Low (L) tones to metrically strong syllables and edges of phonological domains. We follow the conventions of the Spanish ToBI prosodic annotation system (Beckman et al. 2002, Estebas-Vilaplana & Prieto 2010, Prieto & Roseano 2010, Hualde & Prieto 2015). Stressed syllables bear pitch accents, indicated with \*. The pitch accent on the last main stress of an utterance is the nuclear pitch; other stressed syllables bear prenuclear accents (unless deaccented). Edges of phonological domains bear boundary tones. In Spanish, boundary tones occur at the end of full intonational phrases (IPs) and intermediate phrases (ips); these are indicated with % and -, respectively (Aguilar et al. 2009). Figure 1 below provides an example of prosodic annotation for a statement with narrow focus on the direct object. The final IP boundary is low (L%); the intermediate ip shows a steep rise (HH-). All pitch accents are rising; but while the nuclear peak is aligned with the stressed syllable (L+H\*), prenuclear peaks are delayed, i.e., aligned with the post-tonic syllable (L+>H\*).

Figure 1: Example of Spanish ToBI annotation. Participant 15. *El niño mira a su abuelo* 'The child looks at his grandfather' (narrow focus)

The rest of this paper is organized as follows. §2 contextualizes the study in connection to focus and reviews its main syntactic and prosodic characteristics. §3 introduces the methodology of the study. §4 presents the results, and §5 is the discussion. Concluding remarks are provided in §6.

#### **2 Properties of focus**

#### **2.1 Focus types**

Focus, or the information center of a sentence (Chomsky 1971, 1976), is expressed cross-linguistically in one or more of three ways: prosodically, as in English; morphologically, as in Japanese; and syntactically, as in Russian (Gutiérrez-Bravo 2008 and references therein). In Spanish, focus can be expressed prosodically and syntactically (Zubizarreta 1998, Face 2006, Chung 2012, among others).

Focus can be defined according to its size as broad or narrow, and according to its meaning as new information (or presentational), contrastive, or mirative (De-Lancey 1997, Ladd 2008, Gussenhoven 2008). Under broad focus, the entire sentence is focused; this occurs when the whole sentence provides non-presupposed, new information, as shown in (2). On the other hand, under narrow focus only

one sentential element is focused (3). The question in (3a) expresses the presupposition that Adriana bought something (this is the old, given information, or the sentence topic) but the value of the wh-word is unknown. The direct object in (3b) has narrow focus, and supplies the value for the variable bound by the wh-word.

	- b. [focus Adriana Adriana compró buy.pst.3sg un a libro.] book 'Adriana bought a book.'
	- b. Adriana Adriana compró buy.pst.3sg [focus un a libro]. book 'Adriana bought a book.'

Regarding meaning, new information focus corresponds to the non-presupposed part of the sentence (Zubizarreta 1998, Chomsky 1971, 1976, Jackendoff 1972), while contrastive focus negates the value assigned to a specific variable and provides a different value for it (Zubizarreta 1998). On the other hand, mirative focus conveys surprise from unexpected information, has counter-expectational value, and transmits expressive and emotive attitude (Machuca Ayuso & Ríos 2017, De-Lancey 1997, 2001, 2012, Dickinson 2000, Cruschina 2012, Gili Fivela et al. 2015, Jiménez-Fernández 2015b,a, Bianchi et al. 2016, Badan & Crocco 2019, Belletti & Rizzi 2017). The syntactic and prosodic characteristics of these focus types are reviewed next.

#### **2.2 Syntactic properties**

In Spanish, new information focus needs to appear as the rightmost element in the linear string to receive nuclear stress, i.e., to be assigned the main sentence prominence (Zubizarreta 1998, Gutiérrez-Bravo 2008, López 2009). Using the question-answer test, (4b) is ungrammatical as an answer to (4a) because the new information focus *un libro* 'a book' does not appear sentence-finally. In contrast, (4c,4d) constitute valid answers since the focus appears in the rightmost

position (note that in (4d), the pause – indicated with # – effectively makes *un libro* 'a book' rightmost in the linear string). <sup>1</sup>

	- b. \* Adriana Adriana compró buy.pst.3sg [focus un a libro] book en in la the librería. bookstore 'Adriana bought a book at the bookstore.'
	- c. Adriana Adriana compró buy.pst.3sg en in la the librería bookstore [focus un a libro]. book 'Adriana bought a book at the bookstore.'
	- d. Adriana Adriana compró buy.pst.3sg [focus un a libro] book # en in la the librería. bookstore 'Adriana bought a book at the bookstore.'

Contrastive focus differs from new information focus in regards to word order; any element in the sentence can be contrastively focused, regardless of sentence position (Zubizarreta 1998). One contextualized example is given in (5).<sup>2</sup>

	- a. ¿Qué what compró buy.pst.3sg Adriana? Adriana 'What did Adriana buy?'
	- b. Adriana Adriana compró buy.pst.3sg [focus un a LIBRO] book en at la the librería bookstore (no not una a revista). magazine 'Adriana bought a BOOK at the bookstore (not a magazine).'

<sup>1</sup>We follow Zubizarreta's (1998) original intuitions here, but see Ortega-Santos (2016) for a review of current experimental work that shows dialectal variation in the judgments (for example, in Argentinian Spanish, Mexican Spanish or Southern Iberian Spanish). As discussed by Jiménez-Fernández (2015b), Southern Peninsular Spanish has a specific position in the left periphery for new information focus in contrast to Standard Spanish (this includes speakers from Northern Spain and Madrid).

<sup>2</sup>Here and throughout, capitalization is used to indicate elements with contrastive or mirative focus.

Jiménez-Fernández (2015a) points to syntactic differences between contrastive and mirative focus (in the context of focus fronting). While contrastive focus can occur in an embedded sentence as a complement of a verb of saying (6), mirative focus is disallowed in this context (7) (this property was originally discussed by Cruschina (2012) for Italian):<sup>3</sup>

(6) Contrastive statement

a. Juan John va go-pres.3sg diciendo say.ger que that ha have-pres.3sg vendido sell.ptcp la the moto. motorbike (Jiménez-Fernández 2015a: 53)

'John goes saying that he has sold the motorbike.'

b. No, no no. no María Mary dice say.pres.3sg que that el the coche car ha have.pres.3sg vendido, sell.ptcp no not la the moto. motorbike (Jiménez-Fernández 2015a: 53)

'No, no. Mary says that he has sold the car, not the mortorbike.'

(7) [??] ¡¡No not me cl lo it puedo can.pres.1sg creer!! believe.inft ¡¡Va go.pres.3sg diciendo say.ger por by ahí there que that DOS two BOTELLAS bottles DE of VODKA vodka nos cl habíamos have.pst.1pl bebido drink.ptcp en in la the fiesta!! party (Jiménez-Fernández 2015a: 53)

'I can't believe it! He goes saying everywhere that we had drunk TWO BOTTLES OF VODKA at the party!!'

As mentioned in §1, in-situ wh-elements carry the main focus of a question (Horvath 1986, Rochemont 1986, Tuller 1992, Zubizarreta 1998, Escandell Vidal 1999). Reglero (2007) and Reglero & Ticio (2013) argue that wh-phrases in INF questions have new information focus<sup>4</sup> since they elicit non-presupposed information (i.e., the value of the wh-word is unknown; see (3), (4)), and can also

<sup>3</sup> For a discussion on verb adjacency and its interaction with contrastive and mirative focus, see Jiménez-Fernández (2015a).

<sup>4</sup> See Uribe-Etxebarria (2002) for a proposal in which in situ wh-questions in Spanish have contrastive focus. This is primarily based on a more restricted interpretation of wh-in-situ in Spanish (at least according to Jiménez's (1997) intuition). Uribe-Etxebarria provides additional examples and a syntactic analysis that relates the interpretative properties of wh-in-situ in Spanish to their syntactic derivation.

appear in out-of-the-blue contexts. One example is given in (8), where the question is introduced by *dime una cosa* 'tell me something', a phrase eliciting new information.<sup>5</sup> In addition, the in situ wh-phrase needs to appear finally (8b – 8d) (see (4) above).

	- b. ¿Tú you le cl.dat.3sg diste give.pst.2sg el the libro book a to quién? who 'Who did you give the book to?'
	- c. ?? ¿Tú le diste a quién el libro?
	- d. ¿Tú le diste a quién # el libro?

For SUR questions, Reglero & Ticio (2013) have argued that the wh-phrase has contrastive focus<sup>6</sup> since the echo wh-phrase does not need to appear finally (9) (see (5) above). In addition, SUR requires heavy contextualization, unlike INF (10).


Adela Adela fue go.pst.3sg a to visitar visit.inft a dom Aristóteles. Aristotle 'Adela went to visit Aristotle' Speaker 2: ¡No neg me cl.1sg lo cl.acc.3sg puedo can.1sg creer!: believe.inft ¿Adela Adela fue go.pst.3sg a to visitar visit.inft a dom QUIÉN? who 'I can't believe it! Adela went to visit WHO?'

<sup>5</sup>This test is attributed to Ignacio Bosque (p. c.) (Reglero & Ticio 2013). See also González & Reglero (2018, 2020).

<sup>6</sup>Their claim applies to REP echo questions as well.

However, recent work on Italian argues that SUR in this language is associated with Mirative Focus (MirF) (Crocco & Badan 2016, Badan et al. 2017, Badan & Crocco 2019). MirF is a type of focalization involving surprise and unexpectedness. For Italian in-situ questions, MirF and INF have different syntactic properties: the most obvious one is that INF needs to be fronted, unlike SUR (5a,5b).<sup>7</sup> Unlike INF, the wh-phrase in Italian SUR is D-linked to a previous discourse. Both types of questions also show prosodic differences, as discussed in the following section.


'They sell (them) where the almonds?'

#### **2.3 Prosodic properties**

Prosodically, focused constituents tend to stand out over topics. As in many languages, in Spanish, focused elements can constitute separate intonational units (Gutiérrez-Bravo 2008). A high intermediate boundary tone (H- or HH-) can occur between the old (topic) and new information (focus) (Hualde 2014: 268–270; Hualde & Prieto 2015: 369). In addition, non-focal elements tend to have reduced pitch range (see for example De la Mota 1997, Face 2002a).

The realization of both prenuclear (non-final) and nuclear accents tends to differ in broad and narrow focus statements. In Madrid Spanish, pre-nuclear accents tend to have a higher pitch under narrow focus and/or be aligned with the stressed syllable, unlike under broad focus, where the peak tends to be displaced to the post-tonic (Face 2001). Stressed syllables are also longer under narrow focus in this dialect (Face 2000). In Castilian Spanish, nuclear accents tend to have a low pitch accent (L\*) under broad focus, and rising (L+H\*) under narrow focus

<sup>7</sup> See Badan & Crocco (2019) for additional differences in embedded contexts (related to question availability and scope). They propose overt movement of the wh-phrase to a low focus position (MirF) in echo questions.

(Estebas-Vilaplana & Prieto 2010). However, in Spanish contact varieties, including in contact with Basque, prenuclear accents tend to have earlier peaks under broad focus, as well (Elordieta 2003, O'Rourke 2012).

There are also prosodic differences between contrastive and new information focus. The former is characterized by higher pitch, expanded pitch range and/or earlier pitch alignment compared to the latter, at least in statements. In addition, an intermediate high or low boundary (H-, L-) can follow the contrastively focused constituent (De la Mota 1997, Face 2002a,b). Contrastive focus shows longer duration than new information focus sententially, in the focal constituent, and in its stressed syllable (Chung 2012). However, sentence-finally elements in narrow focus appear to have similar pitch height and show early peak alignment, unlike in statements with broad focus, where late alignment is more frequent (Domínguez 2004).

As mentioned in the previous section, wh-in-situ elements in Spanish are focused and are assigned nuclear stress since they are located at the end of the intonational phrase. The rest of the sentence is the topic since the information is presupposed. Impressionistic reports on the prosody of INF questions mention falling intonation and extra or "marked" stress (Escandell Vidal 1999: 63; Uribe-Etxebarria 2002, Reglero & Ticio 2013). On the other hand, in situ-echo questions, particularly those conveying surprise, reportedly display (falling)-rising or sharp/strong intonation and have marked stress on the wh-phrase (Contreras 1999, Pope 1976, Escandell Vidal 1999, Sobin 2010, Chernova 2013, 2017).

A preliminary investigation of wh-in-situ questions in four participants of North-Central Peninsular Spanish shows that INF have final rising intonation more often than SUR. The latter show an expanded sentential tonal range, and a substantially higher final High pitch compared to INF. On the other hand, the duration ratio of the wh-element (i.e, its duration relative to the sentence duration) is larger in INF than in SUR (González & Reglero 2018). These preliminary findings contradict the falling/falling-rising distinction previously reported for INF and SUR, but suggest that marked stress in INF is a perceptual result of increased duration of the wh-element, while sharp/strong intonation in SUR is related to expanded scaling and an elevated final pitch accent/boundary tone (for stress correlates in Spanish, see Ortega-Llebaria & Prieto 2007, 2011, Hualde 2014).

These preliminary results are also in line with other studies investigating intonational differences in pragmatic meaning for Spanish questions. For example, fronted wh-questions with a counter-expectational value have expanded pitch ranges compared to neutral questions. This difference usually goes hand in hand with a difference in boundary tone (Argentinian Spanish: Gabriel et al. 2010) or nuclear configuration (Peninsular Spanish: Estebas-Vilaplana & Prieto 2010;

Hualde & Prieto 2015: 374; Mexican Spanish: De la Mota et al. 2010; Venezuelan Spanish: Astruc et al. 2010).8 9 In addition, although Castilian Spanish echoquestions tend to be realized with upstepped rising nuclear accents (L+¡H\*), those with a counter-expectational value tend to have a sharp final rise (HH%) instead of a low boundary tone (L\*) (Estebas-Vilaplana & Prieto 2010).<sup>10</sup>

Although earlier work considers that surprise echo questions have contrastive focus in Spanish, recent work on Italian suggests that mirative focus is involved since SUR questions have counter-expectational value (Badan & Crocco 2019). In addition to showing clear syntactic differences, SUR wh-in-situ questions in Italian are different prosodically from INF questions in several respects. First, the wh-phrase carries the main prominence of the sentence in SUR but not in INF contexts, where the main prominence falls on the verb. Second, the wh-phrase in SUR shows expanded scaling and has an upstepped rising pitch accent (L+¡H\*); in comparison, INF questions have falling pitch accents, which are closely aligned with the verb. Finally, SUR questions have a high boundary tone after the whelement and a clearly perceived disjuncture with the rest of the question. In contrast, in INF, the verb is followed by a low boundary tone, and a clear disjuncture is not typically perceived. Assuming that INF have new information focus and SUR mirative focus, we explore the intonational properties of both question types to elucidate the prosodic characteristics of both types of focus. We examine data from 14 speakers of North-Central Peninsular Spanish, where non-fronted wh-in-situ questions can have a new information reading, in addition to echo readings. Two specific hypotheses are investigated: First, if Spanish INF and SUR have different foci, they will have distinct prosodic properties. Second, if SUR have MirF, they will differ from INF in one or more of the following: (i) intonational contour, (ii) pitch range, and/or (iii) F0 (Crocco & Badan 2016, Huttenlauch et al. 2016, Badan et al. 2017, Machuca Ayuso & Ríos 2017, Badan & Crocco 2019, among others).

#### **3 Methodology**

#### **3.1 Participants and data collection**

Our participants are Spanish speakers from the Basque Country in northern Spain. Although bilingualism in Spanish and Basque is prevalent, and language

<sup>8</sup> In Ecuadorian Spanish, pitch range exclusively distinguishes between the two (Huttenlauch et al. 2016).

<sup>9</sup>A similar prosodic combination is also reported in Catalan and Italian (Gili Fivela et al. 2015, Prieto et al. 2015).

<sup>10</sup>In Brazilian Portuguese, neutral INF questions have falling intonation, while echo ones are rising (Kato 2019).

contact with Basque influences some prosodic characteristics of Spanish in this area (Elordieta & Romera 2020), the impact of language contact is considered to be minimal or non-existing for this study since Basque does not allow in-situ information or surprise echo questions (Etxepare & Ortiz de Urbina 2003, Reglero 2003).<sup>11</sup>

Data was collected in Summer 2015 in Bilbao, Spain. Participants completed two tasks: a reading task, and a controlled elicitation task. Both were facilitated via a powerpoint that included visual and auditory stimuli to provide contextual information to engage participants in the task and prompt the relevant pragmatic reading. Both tasks were designed to control the context and therefore the pragmatic reading of the stimuli. The reading task is most similar to the methodology employed in other intonational studies of Spanish, including Prieto & Roseano (2010) and Rao (2013) and can be conceived of as involving "scripted speech". The controlled elicitation task, which we focus on in this paper, did not include a written script for participants to read from, and was designed to provide a more naturalistic realization of the stimuli.

The completed experiment took approximately one hour per participant. A total of 22 Spanish participants took part in the experiment; all were paid for their participation. Participants had varied degrees of Spanish-Basque bilingualism. Before the tasks, all participants completed a consent form and the Bilingual Linguistic Profile (BLP; Birdsong et al. 2012) to obtain information on the language history, use proficiency, and attitudes towards Spanish and Basque. For this study, we report data from the elicitation data from 14 participants; all were 21–24 years old females from the province of Bizkaia.

Table 1 provides additional participant information. Positive BLP dominance scores indicate Spanish dominance; scores close to zero indicate balanced bilin-

	- b. Nigandik me.from ZER what atera come dela? aux.that '(That) what has come from me?'

Etxepare & Ortiz de Urbina (2003: 463) indicate that echo wh-questions with corrective/contrastive focus can appear finally with a preceding prosodic break; these are quite marked. Duguine & Irurtzun (2014) indicate that young Laubordin Basque speakers use an innovative strategy involving wh-in situ. None of the participants in our study come from this dialectal area.

<sup>11</sup>Echo wh-questions in Basque are usually preverbal Etxepare & Ortiz de Urbina (2003), as shown in the example below:

gualism. Negative dominance scores indicate that participants are Basque dominant. Only three participants have negative dominance scores (P3, P7, P14); two of them are close to zero (P14, P7).


Table 1: Participant information: Procedence and BLP scores

The target sentences for the elicitation task involved fronted and in-situ whquestions, statements, and yes-no questions. Here we focus on in-situ SUR and INF questions. Contextualized examples are provided below; note that all participants completed a short practice before the tasks, and that the context and prompt were presented aurally (not in written form).

(12) a. Context/Prompt:

*Maite, Cristina, y Elena se han puesto a jugar al escondite con una amiga. Maite se ha escondido detrás de un árbol. Cristina detrás de un arbusto. Para preguntar por Elena una posibilidad sería decir: ¿y dónde se ha escondido Elena? ¿Cuál sería la otra manera de decirlo?* 'Maite, Cristina and Elena are playing hide-and-seek with a friend. Maite hid behind a tree. Cristina hid behind a bush. To ask about Elena, one possibility would be to say: And where did Elena hide? What would be another way to ask this question?'

b. Expected target question: ¿(Y) and Elena Elena se cl.refl ha have.prs.3sg escondido hide.ptcp dónde? where '(And) where did Elena hide?'

#### (13) a. SUR question:

*Estás en la casa de una amiga y te enseña sus mascotas. Te dice: "El gato se llama Macacocogito." Te sorprende muchísimo el extraño nombre de su gato. Hazle una pregunta para comprobar cómo se llama.* 'You are at your friends' house, and she shows you her pets. She says: "My cat's name is Macacocogito". You are completely surprised by the cat's unusual name. Ask your friend a question to double-check the cat's name.'

b. Expected target question: ¿El the gato cat se cl.refl llama name.prs.3sg CÓMO? how 'The cat's name is WHAT?'

#### **3.2 Recording and coding**

Recording was conducted via a Tascam DR-05 digital recorder with built-in omnidirectional microphones. Audio was recorded in 44,000 Hz in mono. 10 INF questions and 10 SUR questions were examined per participant for a total of 280 target sentences. Eight INF and six SUR questions had to be discarded because of waveform distortion and/or wh-fronting, leaving 266 sentences for the acoustic analysis.

Data was coded in Praat (Boersma & Weenik 2021) according to Spanish ToBI conventions (Aguilar et al. 2009, Face & Prieto 2007 inter al.). Both authors were involved in the acoustic analysis. Disagreements, which occurred in approximately 5% of the tokens, were resolved by consensus. The analysis focused on the following characteristics: (i) the overall melodic shape of the question, (ii) its nuclear configuration, (iii) the nuclear peak (in Hz.), and (iv) the focal tonal range (FTR), i.e., the difference between the lowest point at the beginning of the whphrase and its highest pitch. Pitch is reported in Hz and semitones (ST); the latter helps normalize the data and is more closely related to pitch perception. Specifically, a difference of 1.5 ST meets the perceptual threshold, i.e., it is considered to be perceivable by all speakers (T'hart 1981, Toledo 2000, Pamies-Bertrán et al. 2002). Paired two-tailed t-tests were conducted to establish whether these results are statistically significant.

Figure 2–5 below provide examples of melodic contours for INF and SUR. Figure 2 exemplifies the most frequent INF contour; it begins with an initial fall followed by a rise up to the first post-tonic syllable, which diphthongizes with the auxiliary verb to its right. Declination follows up to the beginning of the wh-question, realized with a steep final rise (L+H\* HH%). The FTR is 183 Hz, equivalent to 10.7 ST.

Figure 2: INF question. P21\_12 'And when has the third one gone out?'

Figure 3 exemplifies an additional melodic pattern for INF, which starts with a slight initial fall up to the post-tonic syllable, followed by a slight rise on the verb *fue*. Declination ensues, and the wh-question shows a final rise-fall (L+H\* L%). The FTR is 158 Hz, equivalent to 9.8 ST.

Figure 4 shows a third melodic pattern for INF in our data, involving a rise up to the wh-word, followed by a final fall-rise (H+L\* LH%). The FTR is 89 Hz, equivalent to 7.4 ST.

SUR questions were realized similarly across participants. They involved an initial rise up to the first post-tonic syllable, declination up to wh-question, and a steep final rise (Figure 5). The nuclear configuration can be characterized as L+H\* HH%, as in Figure 2. The FTR is 191 Hz (11.6 ST).

#### **4 Results**

#### **4.1 Overall melodic contour**

All SUR questions in our dataset show three intonational movements: (i) a rise through the first post-tonic syllable; (ii) declination (i.e., pitch lowering) up to

Figure 3: INF question. P15\_4 'And where did Marian go?'

Figure 4: INF question. P18\_13 'And how does Alejandra go up?'

Figure 5: SUR question. P3\_3 'The cat's name is WHAT?'

the wh-phrase, and (iii) a steep final rise (Figure 5). For INF questions, a similar pattern occurs in 85% of cases, although an additional fall is usually present at the beginning (Figure 2). This fall occurs in cases where INF began with *y* 'and', a pragmatic strategy available in INF questions to establish a transition between the previous discourse and the wh-in-situ question (Jiménez 1997). Two additional melodic contours are attested for INF: one characterized by a final rise-fall (7.5%) (Figure 2), and another with an overall rise up to the beginning of the wh-phrase followed by a nuclear fall-rise (7.5%) (Figure 4). Most of these less frequent patterns are found in speakers 15 and 8, respectively.

#### **4.2 Nuclear configuration**

All SUR questions and most INF questions end in a high (HH%) boundary tone. The main exceptions are participant 15, showing a low boundary tone (L%) in 60% of INF, and participant 8, with a rising (LH%) boundary tone in 50% of INF questions. Low or rising boundary tones are also found sporadically in participants 3, 7 and 13.

The realization of the nuclear accent is more variable. Table 2 provides more information about the dominant nuclear configuration and its frequency per participant and type of question investigated. It can be observed that 10 of the participants analyzed (71%) show similar nuclear pitch accents in both INF and SUR: five of them have a rising nuclear pitch accent (L+H\*), and five show a low nuclear pitch accent (L\*).

The four remaining participants have different nuclear pitch accents in INF and SUR. Three of the participants (P7, 8, 13) have a low or falling pitch accent (H+L\*) in INF questions, and a rising pitch accent in SUR questions. Participant 15 shows a preference for a rising pitch accent in INF (L+H\*), and a low pitch accent (L\*) in SUR. As stated above, this participant tends to realize low or rising boundary tones in INF questions.


Table 2: Nuclear configurations

There is no apparent correlation with bilingualism; the patterns showed by Basque dominant speakers P3, P7 and P14 are variable and comparable to those attested in Spanish dominant participants.

#### **4.3 Nuclear high**

Figure 6 shows the values of the nuclear High for all participants in INF and SUR. Eleven participants (79%) have a more elevated H in SUR. On average, the value of H in SUR contexts is +2.1 ST higher than in INF questions. This difference is above the perceptual threshold, suggesting that it is perceptually significant. Results from a paired two-tailed t-test indicate that this difference is statistically significant (*p* = 0.0038). The examination of individual differences shows that the

perceptual threshold is reached or surpassed in 8 of the participants. The remaining three participants do not follow this trend. Specifically, participants P9 and P11 have a more elevated nuclear High in INF contexts, while P5 shows a similar nuclear High in both pragmatic readings (Appendix A).

Figure 6: Nuclear High in INF and SUR questions

#### **4.4 Focal tonal range**

Figure 7 shows a box plot for the focal tonal range of INF and SUR questions for all participants pooled. It can be observed that the medians of INF and SUR are very different. On average, the FTR for SUR is +2.9 ST higher than for INF, well above the perceptual threshold. In addition, results from a paired two-tailed t-test indicate that this difference is statistically significant (*p* < 0.001). The examination of individual differences shows that this perceptual difference holds for 11 participants. For participant P9, this difference approaches the perceptual threshold (1.4 ST.). Two participants do not follow this trend: P11, which has a higher FTR in INF, and P13, which has a similar FTR in both INF and SUR (Appendix B).

#### **5 Discussion**

The present study set out to investigate the prosodic characteristics of two types of pragmatically different wh-in-situ questions in Spanish: those requesting new

Figure 7: Focal Tonal Range in INF and SUR questions

information (INF), and those expressing surprise (SUR). Both share some syntactic similarities, since the wh-in-situ phrase is sentence-final. Our analysis reveals some prosodic similarities as well: the general melodic contour tends to be similar for both in most speakers, generally comprising an initial rise, medial declination, and a steep final rise on the wh-question. In addition, a high (HH\*) final boundary tone tends to be present in both question types.

Syntactically and pragmatically, INF and SUR also show some differences. INF are neutral and restricted to the rightmost position in the linear string, while SUR are counter-expectational and have a less restricted distribution. Prosodically, we find some differences as well: the nuclear High is significantly more elevated in SUR, and the focal tonal range is significantly expanded. A difference in FTR occurs in most participants, suggesting that this is the main prosodic cue distinguishing SUR from INF in this Spanish variety. We don't observe differences according to degree of Basque/Spanish bilingualism. This is expected since, although language contact impacts the realization of some prosodic features in both languages (see for example Elordieta 2003), the wh-in-situ questions investigated here for Spanish are not grammatical in Basque (Etxepare & Ortiz de Urbina 2003, Reglero 2003).

The intonational properties identified in this study for Spanish SUR are comparable to those reported for Italian SUR questions (Badan & Crocco 2019). At

first blush, unlike for Italian, the nuclear configurations of the wh-in-situ phrase in Spanish INF and SUR are similar, as in German, where the tonal contours of INF and SUR are reportedly the same (Repp & Rosin 2015). However, we argue that Spanish INF and SUR have distinct nuclear contours: INF is most frequently realized with a rising nuclear accent (L+H\*), while SUR involves upstepping (L+¡H\*). The difference between these two tonal configurations is reportedly one of pitch range, as shown schematically in Figure 8. Upstepped rising nuclear accents are attested in Italian SUR (Badan & Crocco 2019) and in Spanish counter-expectational questions (Aguilar et al. 2009, Estebas-Vilaplana & Prieto 2010;Hualde & Prieto 2015: 374).

Figure 8: Rising vs. upstepped rising pitch accents (Aguilar et al. 2009)

The participants in our dataset have different degrees of Basque/Spanish bilingualism. We have not observed prosodic differences consistent with Spanish vs. Basque language dominance. Three participants (P5, P9, P13) show individual variation, with either an elevated nuclear peak or a higher FTR in SUR questions, but not both. Only P11 appears to be exceptional since she shows higher F0 and expanded FTR in INF than in SUR, unlike the rest of the participants. We leave open the possibility that low-statistical power and/or individual variation explains this different pattern.

#### **6 Conclusion**

This study has focused on the intonation of INF and SUR questions in Spanish. Results from an elicitation task in 14 female speakers from North-Central Peninsular Spanish show similarities in overall melodic contours and final boundary tones, but also differences in the height of the nuclear accent, the focal tonal range, and the nuclear pitch accent. We argue, following Badan & Crocco (2019) for Italian, that these differences are consistent with a difference between new information and mirative focus.

The analysis of intonation from the five male speakers remaining in our dataset and from the reading task will be relevant to further ascertain the patterns reported here and to inquire into possible gender differences in the intonation of wh-in-situ questions in Spanish. Future studies should investigate additional correlates of focus, including the presence of intermediate boundaries before the wh-element, wh-phrase duration and intensity, and the realization of pre-nuclear peaks (Chung 2012, Face 2001, 2002b, Gryllia et al. 2016 inter alia).

We also would like to note that the investigation of SUR questions in French would be of great interest to further elucidate the prosodic properties of MirF in Romance. Glasbergen-Plas et al. (2020) show that INF and repetition (REP) insitu questions have similar tonal contours in French; however, REP wh-questions have extended pitch scaling and longer duration (cf. Déprez et al. 2013, Cheng & Rooryck 2000, Gryllia et al. 2016. Based on our current understanding of in-situ questions in Italian and Spanish, we consider it extremely likely that French SUR in French will have even wider scaling than REP, and/or might involve a different tonal contour compared to REP and INF.

#### **Abbreviations**


#### **Acknowledgements**

We thank all participants for their time, and Eric Martínez for helping design the materials for this study. Many thanks to Jessica Craft for her enthusiasm and professionalism collecting the data and preparing it for acoustic analysis, and to Erin Christopher, Tyler King and Gus O'Neil for their assistance with coding. We are also grateful to Alex Iribar at the Phonetics Lab at the University of Deusto for kindly allowing us to use their facilities; Jon Franco, who passed away in 2021, for his help recruiting participants and his support and encouragement; and Mark Amengual for assistance with the BLP. We also gratefully acknowledge the suggestions from three anonymous reviewers, and the assistance of the volume editors and Luis Avilés González in the preparation of the final version of this manuscript. All errors are of course ours. This research was funded by an FSU COFRS grant awarded to the second author in 2014–2015.

Table 3: Nuclear High


#### **Appendix A**

290

#### **Appendix B**


Table 4: FTR

#### **References**


## **Chapter 14**

## **Mechanical vs. functional processes in subject pronoun expression in Spanish second language learners**

### Ana de Prada Pérez<sup>a</sup> & Nick Feroce<sup>b</sup>

<sup>a</sup>Maynooth University <sup>b</sup>Lexia Learning

Subject pronoun expression in Spanish is regulated by functional factors while the role of the mechanical factor priming, or perseveration, has been a source of debate. Additionally, 3sg subjects have been identified as more difficult to acquire than 1sg in L2 Spanish. In this paper we explore the interaction between Switch Reference, Form of Previous Subject (perseveration), and Speaker group in 1sg and 3sg subjects in L2 Spanish through the oral narratives of 28 Spanish L2ers (15 higher and 13 lower proficiency) and 9 Spanish-English bilingual native speakers (NSs). Results showed differences between L2ers and NSs in rates of overt pronominal subjects and in sensitivity to Switch Reference but not in the effect of priming. We hypothesize the differences in the interactions between variables in 1sg vs. 3sg could be due to 3sg involving reference-tracking and perseveration only being evident in contexts where the pronoun does not signal pragmatic content.

### **1 Introduction**

There is a longstanding tradition of variationist work comparing different monolingual as well as bilingual varieties with respect to subject pronoun expression (SPE). All these studies show that, while there is extant variation across varieties in terms of rates of overt pronominal subjects, little differences are found in the variables that condition the distribution. They also convincingly show that the distribution can only be explained by a combination of variables and that it is variable, except for some non-variable cases (see list of cases in Cameron 1997:

Ana de Prada Pérez & Nick Feroce. 2023. Mechanical vs. functional processes in subject pronoun expression in Spanish second language learners. In Barbara E. Bullock, Cinzia Russi & Almeida Jacqueline Toribio (eds.), *A half century of Romance linguistics: Selected proceedings of the 50th Linguistic Symposium on Romance Languages*, 299–321. Berlin: Language Science Press. DOI: 10.5281/zenodo.7525118

33–36). This implies that there are no ungrammatical or inappropriate uses in the case of specific variable uses, but there may be differences in the trends of use across speaker groups. Variationist methodologies have been widely applied to the acquisition of SPE in Spanish as a second language (L2). This paper offers further evidence of progression of acquisition of SPE in Spanish using variationist methodologies.

Several variables affect the distribution of overt and null subjects in Spanish. Among the functional factors that have been found to affect the use of an overt pronominal subject are those that favor continuity in the discourse. For instance, the variable Switch Reference, where the referent is the same or different from the previous subject, has been reported to have a large effect on the distribution of expressed vs. unexpressed pronominal subjects, as in (1). Participant 2 used a null subject in the second clause, which has the same referent as the first clause.

(1) Yo I jugué play.1sg.past fútbol soccer durante during la the escuela school y and asistí attend.1sg.past a to una a escuela private privada school para for trece thirteen años. years 'I played soccer during my school years and I attended a private school for thirteen years.'

(Participant 2, lower proficiency group)

Also highly ranked is the variable Grammatical Person. Several studies, particularly on monolingual varieties, report higher rates of overt pronominal subjects in first (1sg) than in third person singular (3sg), while the results are different for bilingual speakers (with the exception of Travis & Torres Cacoullos 2012) and second language learners (L2ers), who produce more expressed subjects in 3sg than in 1sg (Geeslin & Gudmestad 2011, 2016, Gudmestad & Geeslin 2010, Gudmestad et al. 2013, Lozano 2009, 2016, Prada Pérez & Feroce 2020), as shown in (2). Participant 1 uses the overt pronoun with a third person singular subject in a context of same reference as the previous sentence.

(2) Es be.3sg.pres mayor. older Ella she tiene have.3sg.pres 35 35 años years y and una a hija. daughter 'She is older. She is 35 and has a daughter.'

(Participant 1, higher proficiency group)

Additionally, previous work has included the mechanical factor of Perseveration, or priming. Previous research refers to priming or perseveration as a mechanical factor in contrast with functional factors, which are guided by meaning

and communication (e.g. Travis & Torres Cacoullos 2012; however, see Otheguy 2015). An increased use of the overt form has been reported for instances where the previous mention was also an overt form, both for native speakers (NSs) (e.g. Cameron 1994) and L2ers (e.g. Abreu 2009), as in (3). Participant 5 uses an overt pronominal subject in the first clause and continues to use it with the same referent in subsequent sentences.

(3) Él He trabaja work.3sg.pres en in la the oficina office de of un a abogado lawyer pero but él he no neg es be.3sg.pres un a abogado. lawyer. Él He trabaja work.3sg.pres con with casos cases específicos. specific 'He works in an attorney's office but he is not a lawyer. He works on specific cases.'

(Particpant 5, higher proficiency group)

The role of proficiency in the L2 acquisition of Spanish SPE has been previously examined in the literature. Overall, participants tend to use fewer overt subjects as proficiency increases, except for one dataset examined in Geeslin et al. (2015) and Geeslin et al. (2013), where they reported an inverted u-shape for pronoun rates across proficiency. It has previously been proposed that L2ers at beginner levels might omit subjects, relying greatly on pragmatics and discourse context until their grammar is more developed. For intermediate and more advanced levels of proficiency, however, there is a consensus that development of L2 SPE rates moves from more overt pronominal subjects to fewer overt pronominal subjects. In this paper, we further examine the role of proficiency. We specifically examine possible differences in rates, patterns (as instantiated in the conditioning factors regulating the distribution), and their interaction with priming across different speaker groups, including two proficiency groups of L2ers. Our aim is to compare higher and lower proficiency L2ers to bilingual NSs in their use of Spanish SPE, with respect to (i) rates of overt pronominal subjects, (ii) patterns of use as instantiated in functional factors, and (iii) the effect of the mechanical factor, perseveration.

#### **2 Background and motivation**

#### **2.1 Subject expression in Spanish**

Variationist analyses of Spanish subject expression have indicated that SPE in Spanish is best explained by a combination of variables (Carvalho et al. 2015), except in those cases traditionally excluded from variable rule analyses: predicates

that require an expletive subject, predicates accompanying impersonal uses of the second person singular and third person plural, reverse psychological predicates, predicates in subject relative clauses, subjects with inanimate referents, and predicates in set phrases. In each case, the null pronominal subject fails to alternate with an overt counterpart.

In general terms, null subjects tend to indicate continuity in variable contexts. Thus, the factor Switch Reference, or whether the subject in the preceding sentence is the same or not, favors the use of a null subject when the previous subject has the same referent. Another relevant factor is the variable Person. Previous research indicates that Spanish speakers use more overt subjects in 1sg than in 3sg. Travis & Torres Cacoullos (2018) explored differences in 1sg and 3sg subjects with respect to accessibility of reference. They reported accessibility impacted 1sg at a shorter distance from previous mention, than 3sg subjects. Although there are other factors identified in previous research, Switch Reference and Person are significant and highly ranked (i.e., larger magnitude of effect) across studies and are, thus, the object of study in this paper.

In addition to functional factors, there is one mechanical factor that has received attention in the previous literature, perseveration or priming. Several previous studies using variable rule analyses have reported that pronouns lead to pronouns and null subjects to null subjects (Abreu 2012; Cameron 1994; Cameron & Flores-Ferrán 2004; Flores-Ferrán 2002; Travis & Torres Cacoullos 2012; Travis 2007). In a rather different approach, Otheguy (2015) presented cross-tabulated data from eight interviews from Otheguy & Zentella's (2012) NYC corpus and concluded that there was no priming effect. Otheguy (2015) reported the rates of overt pronominal subjects preceded by pronominal subjects vs. a different type of subject and the rates of null subjects preceded by null subjects vs. a different type of subject. In their data, null subjects tended to be preceded by null subjects (perseveration) while overt pronominal subjects tended to be preceded by a different type of subject (interspersion). Given the preponderance of null subjects in Spanish, it is likely that the analysis was affected by their frequency. In variable rule analyses, in contrast, the analyses compared the likelihood of having an overt pronominal subject vs. a null subject when the preceding subject was also pronominal vs. when it was null or lexical or any other form. Thus, although there is some discrepancy, overall, in Spanish, previous research concludes that a pronominal subject increases the likelihood of using a pronominal subject in a subsequent clause.

Lastly, it is important to note that there is an interaction between Switch Reference and Priming. Cameron & Flores-Ferrán (2004) reported that the priming effect is larger in contexts of same reference than in contexts of switch reference. Similarly, the priming effect was larger in 1sg than in 3sg subjects in Heritage Speakers of Spanish (HS) in Prada Pérez (2020).

#### **2.2 Subject expression in L2 Spanish**

Across studies, the variable Switch Reference has consistently been found to be a significant predictor of SPE. Specifically, NSs and L2ers of different proficiencies use more overt pronominal subjects when the previous subject has a different referent than when it is coreferential (Linford & Shin 2013; Linford 2014). For instance, from the seven participant groups, only the two with the lowest proficiency were not sensitive to this variable in Geeslin et al. (2013) and Geeslin et al. (2015). Additionally, sensitivity to this factor has been shown to be restricted to L2ers with higher self-reported motivation and those with at least an intermediate proficiency level (Linford 2014). Prada Pérez & Feroce (2020) examined 1sg and 3sg subjects separately. They found that higher proficiency L2ers were sensitive to Switch Reference in both persons although with a smaller effect size in 1sg than the control group. In contrast, the lower proficiency L2ers were sensitive to this variable only in 3sg. The authors proposed that acquiring 3sg null subjects was more difficult due to null subjects being more marked in 3sg (Artstein 1999). Thus, both NS and L2 sensitivity to Switch Reference may differ across grammatical persons.

With respect to the variable Person, Abreu (2009) reported a significant effect for the variable in all her participant groups, consisting of 10 Spanish (functionally) monolingual speakers from Puerto Rico, with 10 Spanish heritage speakers and 10 Spanish L2ers. Although it was the highest ranked variable in both the NSs and the L2ers, there were differences between the groups in the specific persons that favored the use of overt pronominal subjects more. For NSs, the hierarchy of grammatical persons in descending order was 2sg, 1sg, 3sg, 1pl, and 3pl subjects. For L2ers the hierarchy in descending order was 3sg, 1sg, 2sg, 3pl, and 1pl. Linford & Shin (2013) also reported an effect for person, albeit only for the higher proficiency group and Geeslin et al. (2013) and Geeslin et al. (2015) for two of the seven groups (7th semester and 4th year undergraduate students). These groups used significantly more overt pronominal subjects in 3sg than in 1sg, a trend also reported in Linford (2014) and Prada Pérez & Feroce (2020). Geeslin & Gudmestad (2008) also reported differences between L2ers and NSs in 3sg contexts and not in 1sg contexts, in particular with the L2ers higher use of lexical subjects in 3sg than NSs. Long (2016) did not report Person as a significant variable (except for Level 2 and Level 4 L2ers). However, this may have been due in part to treating grammatical person and number as separate variables. Thus, it

is likely that Person was not significant given differences in Number. For example, 3sg has been consistently found to favor overt pronominal subject while 3pl does not. Since Abreu (2009), Linford (2014), Linford & Shin (2013), Long (2016), and Geeslin & Gudmestad (2008) did not perform separate analyses for different persons, it is not possible to know if the differences in rates across persons also indicated differences in the variables that were significant in 1sg vs. 3sg, for instance. Within the same dataset, subsequent papers by Gudmestad and colleagues examined the distribution of subject forms in 1st and 2nd person subjects (Geeslin & Gudmestad 2016) and in 3sg subjects (Gudmestad et al. 2013). Geeslin & Gudmestad (2016) compared 1st person and 2nd person subjects and concluded that NSs and L2ers were very similar in 1st person both in terms of rates of overt pronominal subject expression as well as the variables that regulated the distribution. Gudmestad et al. (2013) examined 3sg subjects in the same dataset. They also reported differences between the NSs and L2ers but not in the alternation between null and overt pronominal subjects but in their use of other pronominal subjects and lexical subjects (see also Linford et al. 2016). Finally, Prada Pérez & Feroce (2020) also examined 1sg subjects separately from 3sg subjects, revealing that L2ers differed from NSs more in 3sg subjects in rates of use. They, however, were more similar to NSs in 3sg when examining the variables that regulated SPE.

With respect to the variable perseveration in L2 SPE, most studies have coded for the form of previous mention (Abreu 2009; Geeslin & Gudmestad 2011, 2016; Gudmestad et al. 2013; Long 2016), that is, the form of its referent, whether it was found in the immediately preceding clause or at a longer distance, while Linford (2016) and Zahler (2018) coded for the immediately preceding subject. Overall, these studies report an effect of perseveration in L2 groups that parallels that of NSs, i.e., speakers used more overt pronominal subjects when the previous subject was pronominal than when it was null.

#### **3 The present paper**

#### **3.1 Research questions and hypotheses**

The overarching question of this paper is: What are the effects of Switch Reference, Form of Previous Subject, and Speaker group on the SPE of Spanish L2ers and NSs? And how do these factors interact?

First, we examine whether speakers are sensitive to Switch Reference. It is expected that the NSs produce more overt pronominal subjects in contexts of different than in contexts of same reference. Additionally, it is expected that learners are sensitive to this variable as proficiency increases. Thus, an interaction

between Switch Reference and Speaker group is anticipated, particularly for 1sg subjects (Prada Pérez & Feroce 2020).

With respect to the effect of perseveration or priming, based on previous L2 studies (e.g., Abreu 2009), we expect the variable Form of Previous Subject to be significant in our data and to have a similar effect across speaker groups. In line with Cameron (1994), a Switch Reference by Form of Previous Subject interaction is expected in our data, where priming was only observable in coreferential subjects.

Lastly, the role of Speaker group is examined through comparisons among the three groups of speakers (higher and lower proficiency L2ers, and a NS group) with respect to rates but also interactions with Switch Reference and Form of Previous Subject. Differences across proficiency groups have been reported in previous studies in terms of rates and conditioning factors, where L2ers, particularly those at lower proficiency levels, use overt pronominal subjects more than NSs (cf. Geeslin et al. 2013; Geeslin et al. 2015), particularly in 3sg subjects, and are not as sensitive to functional factors as NSs (Prada Pérez & Feroce 2020). Thus, we expect our data to show differences across the three speaker groups in rates of overt pronominal subjects and in sensitivity to Switch Reference (Switch Reference by Speaker group interaction). With respect to perseveration, however, we do not anticipate differences between the different speaker groups, as it is a mechanical factor.

#### **3.2 Participants**

A total of 28 native English speakers (19 females, age 19–24) and 9 bilingual NSs (8 females, age 18–22) in 3rd or 4th level Spanish courses were recruited from a public university in the U.S. All the NSs reported acquiring Spanish as children in their homes and were speakers of non-Caribbean dialects: Argentina (n = 2), Chile (n = 1), Colombia (n = 2), Ecuador (n = 1), El Salvador (n = 1), Nicaragua (n = 1), and Peru (n = 1). A total of six bilingual NSs were born abroad and migrated to the US at ages 5 (n = 1), 6 (n = 2), and 7 (n = 3). All the L2 participants reported learning Spanish after the age of 12 and in a classroom context. Nine of the L2 participants reported having studied abroad in either Costa Rica, Dominican Republic, Peru, or Spain (mean time abroad: 4.7 months, range: 2 weeks–30 months).

All participants, except for one L2er, completed an independent measure of proficiency. The NSs scored higher (15–45, M = 42/50, SD = 9.9) than the L2ers (23-45, M = 34/50, SD = 6.9). The L2 participants were separated into two groups based on a median split of their proficiency scores (Median = 34): higher proficiency group, or L2H, (n = 15) and lower proficiency group, or L2L, (n = 13). The

participant who did not complete the proficiency measure was assigned to the lower proficiency group based on the perception of the authors in comparison with other participants in the study. The bilingual NSs were not further divided into two groups, given the small number of participants. Additionally, five scored above 40/50 and the other four were not perceived to be less proficient by the authors in the interview. Three of these four were born in a Spanish-speaking country and migrated to the US when they were either 6 (n = 2) or 7 (n = 1). To further support our perception, we measured the number of words in a minute in the recording (minute 10 to 11, M = 126, SD = 17.9) and those four participants with scores lower than 40/50 in the proficiency measure did not appear to be less fluent than the others, as can be observed in Table 1.


Table 1: HS proficiency and fluency results

As for their language use, the bilingual native speakers reported using Spanish every day, most of the day (n = 2), or sporadically throughout the day (n = 2), several times per week (n = 4), or once or twice per month (n = 1). In contrast, they all reported speaking English every day for most of the day. The Spanish L2ers reported using Spanish every day, sporadically throughout the day (n = 13), several times per week (n =13), once per week (n =1), or once or twice per month (n = 1). With respect to English, they speak it every day, most of the day (n = 24) or sporadically throughout the day (n = 2)<sup>1</sup> .

<sup>1</sup>One of the L2ers only completed the proficiency measure, another one only completed the Language Background Questionnaire (LBQ), and another one did not complete the LBQ or the proficiency measure. During the interview some of their background was discussed, but we have no comparable measure of how often they use their languages.

We acknowledge that the number of participants, particularly in the bilingual NS group, is rather low, and may incur in a Type II error, where some trends may not reach significance due to the low number of participants. We also recognize the difficulty that emerges to compare across studies when the comparison group is different from that used in previous studies. In order to address the comparative fallacy in L2 research (Bley-Vroman 1983), we chose to compare our L2 group to another Spanish-English bilingual group they are comparable to in other ways (e.g. age, occupation, and state where they grew up).

#### **3.3 Materials and coding**

Participants completed a PowerPoint-guided sociolinguistic interview with the first author (a native Spanish speaker), a language background questionnaire (LBQ), and a proficiency test. In the LBQ, participants were asked to provide information on their personal history, their language history, their reported language use, and their self-reported proficiency. The Spanish proficiency test, with a total of 50 questions, consisted of a multiple-choice grammar section and a cloze test, based on the Diploma de Español como Lengua Extranjera (DELE), and is widely used in the field. The LBQ and the proficiency test were presented on the online platform Qualtrics and were completed after the interview. The interviews lasted between 30 and 60 minutes. Participants were asked to consider the interview an informal conversation. The interview data was transcribed and coded. Only 1sg and 3sg tokens in variable contexts were included in the analysis. Variables of interest to this study were: Subject form (null or overt pronominal subject), Person (1sg or 3sg), Switch Reference (same referent or different referent), and Form of Previous Subject (null, overt pronominal, lexical). For Form of Previous Subject, we did not limit previous subjects to be eligible based on functional factors, such as coreference or eligibility (e.g., lexical subjects, unlike Abreu 2009; Flores-Ferrán 2002; Travis & Torres Cacoullos 2012; Travis 2007) or animacy (in contrast with Otheguy 2015). Otheguy (2015) explained the motivation for a wider inclusion was to examine perseveration as a mechanical motivation. We also coded for one extra-linguistic variable: Speaker group (NSs, L2H, and L2L).

#### **3.4 Results**

#### **3.4.1 1sg**

Pronoun production rates for 1sg are presented in Figure 1 and were statistically analyzed via logistic mixed effects models using the function *glmer()* in R (R

Core Team 2017). Separate analyses were done for 1sg and 3sg, given the different nature (deictic vs. referential) and behavior attested, e.g. different variables as significant (Prada Pérez & Gómez Soler 2020) and different patterns (Travis & Torres Cacoullos 2018). The results for the model below reveal the likelihood that a speaker would produce an overt pronoun over a null pronoun. We included all main effects and interactions between Switch Reference (Same referent, Different referent), Form of Previous Subject (Null pronoun, Overt pronoun, Lexical subject), and Speaker group (Natives, L2H, L2L), as well as random intercepts of Participant and Verb. To assess the overall contribution of the main factors and interactions, model comparisons were conducted by removing individual terms from the regression and comparing the reduced models against the maximal model. This revealed significant contributions of the main factors of Switch Reference (χ2(9) = 44.928, p < 0.001), Form of Previous Subject (χ2(12) = 25.836, p = 0.011), and Speaker group (χ2(12) = 22.378, p = 0.034), as well as the interaction between Switch Reference and Speaker group (χ2(6) = 12.979, p = 0.043). There were no significant reductions in model fit when removing the interactions between Switch Reference and Form of Previous Subject (χ2(6) = 5.160, p = 0.524), nor Form of Previous Subject and Speaker group (χ2(8) = 13.068, p = 0.110), nor the interaction between Switch Reference, Form of Previous Subject, and Speaker group (χ2(4) = 4.924, p = 0.295). This suggests that the variability in the data was primarily driven by the main factors and the interaction between Switch Reference and Speaker group. The maximal model was used to analyze statistical comparisons across conditions by releveling the reference levels for each factor. We report comparisons for each condition and group, but note that these should be taken with caution as there was no significant three-way interaction. We provide these comparisons to provide the most transparent description of the data as they raise theoretically interesting issues for future research to continue investigating, but we acknowledge that these are merely exploratory at present.

For Switch Reference, all speaker groups produced overt pronouns at a higher rate in different reference than same reference contexts, but this was modulated by the form of the previous subject. For the NSs, the effect of Switch Reference was significant when the previous subject was a null pronoun (β = 1.059, SE = 0.214, z = 4.947, p < 0.001) or a lexical subject (β = 1.378, SE = 0.616, z = 2.237, p = 0.025), but not when it was a pronoun (β = 0.771, SE = 0.602, z = 1.281, p = 0.200). For L2Hs, the effect of Switch Reference was marginally significant when the previous subject was a null pronoun (β = 0.339, SE = 0.197, z = 1.721, p = 0.085), but not when it was a pronoun (β = -0.025, SE = 0.396, z = −0.063, p = 0.950) or a lexical subject (β = 1.442, SE = 1.046, z = 1.379, p = 0.168). For the L2Ls, the effect of

Switch Reference was significant when the previous subject was a null pronoun (β = 0.472, SE = 0.220, z = 2.148, p = 0.032) or an overt pronoun (β = 0.860, SE = 0.391, z = 2.202, p = 0.028), but not when it was a lexical subject (β = −0.663, SE = 0.872, z = −0.761, p = 0.447).

In same reference contexts, pronoun production rates were modulated by Form of Previous Subject, based on comparisons to a previous null subject. NSs did not produce pronouns at a significantly higher rate when the previous form was an overt pronoun (β = 0.507, SE = 0.343, z = 1.477, p = 0.140) or a lexical subject (β = −0.292, SE = 0.576, z = −0.507, p = 0.612). L2Hs produced pronouns at a higher rate when the previous subject was an overt pronoun (β = 0.697, SE = 0.253, z = 2.749, p = 0.006), but not when it was a lexical subject (β = −0.332, SE = 1.038, z = −0.320, p = 0.749). L2Ls did not produce a pronoun at a higher rate when the previous form was an overt pronoun (β = −0.703, SE = 0.870, z = −0.808, p = 0.419) or a lexical subject (β = 0.703, SE = 0.871, z = 0.807, p = 0.420).

In different reference contexts, pronoun production rates were also modulated by Form of Previous Subject, based on comparisons to a previous null subject. NSs did not produce more overt pronouns when the previous form was an overt pronoun (β = 0.219, SE = 0.539, z = 0.407, p = 0.684) or a lexical subject (β = 0.027, SE = 0.290, z = 0.094, p = 0.925). L2Hs produced more overt pronouns when the previous subject was a lexical subject (β = 0.772, SE = 0.238, z = 3.241, p = 0.001), but not when it was an overt pronoun (β = 0.333, SE = 0.360, z = 0.926, p = 0.354). L2Ls did not produce more overt pronouns when the previous form was an overt pronoun (β = 0.576, SE = 0.356, z = 1.617, p = 0.106) or a lexical subject (β = −0.245, SE = 0.308, z = −0.795, p = 0.426).

Pronoun production rates were also modulated by Speaker group. In same reference contexts, the L2Ls produced overt pronouns at a marginally higher rate than the NSs when the previous subject was a lexical subject (β = 1.890, SE = 1.110, z = 1.702, p = 0.089) and marginally produced more overt pronominal subjects than the L2Hs when the previous subject was a null pronoun (β = 0.836, SE = 0.436, z = 1.915, p = 0.055). In different reference contexts, the L2Ls produced overt pronominal subjects at a higher rate than the L2Hs when the previous subject was an overt pronoun (β = 1.212, SE = 0.598, z = 2.026, p = 0.043) or a null pronoun (β = 0.969, SE = 0.466, z = 2.082, p = 0.037). Additionally, the NSs produced overt pronouns at a marginally higher rate than the L2Hs when the previous subject was a null pronoun (β = 0.849, SE = 0.497, z = 1.707, p = 0.088).

Figure 1: Average percent pronoun rates for 1sg with 95% confidence interval bars.

#### **3.4.2 3sg**

Pronoun production rates for 3sg are presented in Figure 2 and were analyzed using the same methods as for 1sg. Model comparison analyses revealed significant contributions of the main factors of Switch Reference (χ2(9) = 78.246, p < 0.001), Form of Previous Subject (χ2(12) = 26.137, p = 0.010), and Speaker group (χ2(12) = 21.945, p = 0.038) as well as the interaction between Switch Reference and Form of Previous Subject (χ2(6) = 18.147, p = 0.006). There were no significant reductions in model fit when removing the interactions between Switch Reference and Speaker group (χ2(6) = 2.617, p = 0.855), Form of Previous Subject and Speaker group (χ2(8) = 6.965, p = 0.540), nor Switch Reference, Form of Previous Subject, and Speaker group (χ2(4) = 2.563, p = 0.633). This suggests that variability in the data was primarily driven between the main factors, as well as the interaction between Switch Reference and Form of Previous Subject. Similar to 1sg, comparisons across individual conditions for 3sg were analyzed based on releveling the reference levels for each factor in the maximal model.

For Switch Reference, all speaker groups produced more overt pronominal subjects in different reference than same reference contexts, but this was modulated by the Form of the Previous Subject. For the NSs, the effect of Switch Reference was significant when the previous subject was a null pronoun (β = 1.586, SE =

0.424, z = 3.740, p < 0.001), was marginal when it was an overt pronoun (β = 1.291, SE = 0.691, z = 1.868, p = 0.062), but was not significant when it was a lexical subject (β = −0.855, SE = 1.178, z = −0.725, p = 0.468). For the L2Hs, the effect of Switch Reference was significant when the previous subject was a null pronoun (β = 1.576, SE = 0.295, z = 5.346, p < 0.001), was marginal when it was an overt pronoun (β = 0.980, SE = 0.532, z = 1.842, p = 0.066), but was not significant when it was a lexical subject (β = 0.254, SE = 0.459, z = 0.553, p = 0.580). For the L2Ls, the effect of Switch Reference was significant when the previous subject was a null pronoun (β = 1.446, SE = 0.367, z = 3.943, p < 0.001), or an overt pronoun (β = 2.115, SE = 0.813, z = 2.602, p = 0.009), but not when it was a lexical subject (β = −0.253, SE = 0.606, z = −0.418, p = 0.678).

In same reference contexts, pronoun production rates were modulated by Form of Previous Subject, based on comparisons to a previous null subject. NSs produced significantly more overt pronominal subjects when the previous form was an overt pronoun (β = 1.170, SE = 0.595, z = 1.965, p = 0.049) but not when it was a lexical subject (β = 0.614, SE = 0.619, z = 0.992, p = 0.321). L2Hs produced overt pronominal subjects at a marginally higher rate when the previous subject was a lexical subject (β = 0.654, SE = 0.341, z = 1.918, p = 0.055), but not when it was an overt pronoun (β = 0.426, SE = 0.281, z = 1.516, p = 0.129). L2Ls did not produce more overt pronominal subjects when the previous form was an overt pronoun (β = 0.038, SE = 0.346, z = 0.111, p = 0.912) or a lexical subject (β = 0.396, SE = 0.376, z = 1.055, p = 0.291).

In different reference contexts, pronoun production rates were also modulated by Form of Previous Subject, based on comparisons to a previous null subject. NSs did not produce overt pronominal subjects at a higher rate when the previous form was an overt pronoun (β = 0.874, SE = 0.550, z = 1.589, p = 0.112) but produced overt pronominal subjects at a marginally lower rate when the previous subject was a lexical subject (β = -1.826, SE = 1.085, z = −1.684, p = 0.092). L2Hs did not produce more overt pronominal subjects when the previous subject was an overt pronoun (β = −0.169, SE = 0.546, z = −0.310, p = 0.757), or a lexical subject (β = −0.668, SE = 0.420, z = −1.592, p = 0.111). L2Ls did not produce more overt pronominal subjects when the previous form was an overt pronoun (β = 0.707, SE = 0.817, z = 0.865, p = 0.387) but produced overt pronominal subjects at a lower rate when the previous subject was a lexical subject (β = −1.303, SE = 0.598, z = −2.178, p = 0.029).

Pronoun production rates were also modulated by Speaker group. In same reference contexts, the L2Ls produced more overt pronominal subjects than the NSs when the previous subject was a lexical subject (β = 1.723, SE = 0.698, z = 2.468, p = 0.014) or a null pronoun (β = 1.941, SE = 0.552, z = 3.519, p < 0.001). The L2Hs also produced overt pronominal subjects at a higher rate than the NSs when the previous subject was a lexical subject (β = 1.688, SE = 0.690, z = 2.447, p = 0.014) or a null pronoun (β = 1.649, SE = 0.523, z = 3.153, p = 0.002). In different reference contexts, the L2Ls produced more overt pronominal subjects than the NSs when the previous subject was a null pronoun (β = 1.800, SE = 0.532, z = 3.386, p = 0.001), and marginally more overt pronominal subjects when the previous subject was a lexical subject (β = 2.324, SE = 1.231, z = 1.888, p = 0.059) or an overt subject pronoun (β = 1.633, SE = 0.972, z = 1.680, p = 0.093). The L2Hs produced more overt pronominal subjects than the NSs when the previous subject was a lexical subject (β = 2.797, SE = 1.162, z = 2.408, p = 0.016) or a null pronoun (β = 1.638, SE = 0.497, z = 3.295, p = 0.001).

Figure 2: Average percent pronoun rates for 3sg with 95% confidence interval bars.

The trends reported with respect to Switch Reference and Form of Previous Subject per participant group are summarized in Table 2 and 3. There was no significant three-way interaction for either 1sg or 3sg. Rather, the results for 1sg were driven by main effects and the interaction between Switch Reference and Group, while those for 3sg are driven by main effects and the interaction between Switch Reference and Form of Previous Subject.


Table 2: Summary of results for the effect of Switch Reference for each participant group (*p*-values provided for marginally significant results)

Table 3: Summary of results for the effect of Form of Previous Subject for each participant group (*p*-values provided for marginally significant results)


#### **4 Discussion**

Differences between speaker groups were explored in rates, as well as patterns of use. With respect to rates of overt pronominal subjects, different trends were found for 1sg and 3sg subjects. For 1sg subjects, there was evidence that the L2Hs were overshooting the target. While their rates were comparable to those of the NSs in other contexts, they used significantly fewer overt pronominal subjects in different reference contexts (particularly with a previous null subject) than the NSs. The overuse of overt pronominal subjects was only attested in the L2Ls in certain contexts in our data. For 3sg subjects, the rates of both learner groups differed from the NSs in similar ways. This result is in line with previous research where L2ers are found to produce more overt pronominal subjects than NSs, except for the Geeslin and Gudmestad studies with near-native speakers, where they did not find differences. The proficiency of the L2H group is probably lower than the near-natives in these studies, explaining this difference.

To answer our research question on the effects and interactions of the variables under study, we turn to each variable. The results revealed interactions between the three variables, but no three-way interaction was reported either for 1sg or for 3sg subjects.

Regarding the effect of Switch Reference, we expected that the NSs would produce more overt pronominal subjects in contexts of different reference than in contexts of same reference. Additionally, it was expected that learners would be increasingly sensitive to this variable as proficiency increased. Switch Reference was significant both in 1sg and in 3sg, where speakers tended to use more overt subjects in different than in same reference contexts. However, this production was modulated by other factors. In 1sg, there was a Switch Reference by Speaker group interaction. While the results revealed that this interaction was present at all levels of the variable Form of Previous Subject, not all these contexts were as informative. Importantly, it is in the context of a previous null subject where we can observe more meaningful contrasts between the groups with respect to the variable Switch Reference. The difference among the groups is reflected in effect size: Switch Reference had a large effect for the NSs, a smaller effect for the L2Ls, and a marginal effect for the L2Hs. The results for the L2Hs are surprising, as we expected L2ers to improve their sensitivity to this variable as proficiency increased. As previously attested in the literature with respect to rates of overt pronominal subjects (Geeslin et al. 2013; Geeslin et al. 2015), the path of L2 acquisition is not necessarily straight. In this case, the underuse of overt pronominal subjects in the L2H group brought about a loss of sensitivity to Switch Reference in 1sg subjects. This is not the case however, for 3sg subjects, where Switch

Reference did not interact with Speaker group, suggesting that the use of overt pronominal subjects as a marker of a different referent was similar across the three groups of speakers. On the other hand, there was a significant interaction of Switch Reference with Form of Previous Subject for 3sg subjects. While Switch Reference was significant for all the groups when the previous subject was null, it was only significant for the L2Ls and marginally for the other two groups when the previous subject was pronominal. Thus, pronominal priming mitigated the Switch Reference effect. Additionally, Switch Reference was not significant for any of the groups when the previous subject was lexical. Since we only analyzed tokens with null or overt pronominal subjects and did not include lexical subjects, we cannot explore the hypothesis that lexical subjects might be preferred in different referent contexts after a lexical subject in the previous clause, in line with Lozano (2016: 256).

With respect to the effect of perseveration or priming, we predicted that Form of Previous Subject and its interaction with Switch Reference would be significant in our data. Additionally, as a mechanical factor, it was expected to have a similar effect across speaker groups and, thus, no interaction with Speaker group was anticipated. The results showed that Form of Previous Subject was indeed significant in our data. In 1sg data, there was evidence of pronominal priming. In 3sg subjects, there was a main effect of Form of Previous Subject but also an interaction with Switch Reference. Crucially, previous pronominal subjects favored the use of overt pronominal subjects more than previous null subjects for the NSs, which is evidence of pronominal priming, in contexts of same reference. However, this effect dissipated in contexts of different referent, a result in line with Cameron (1994). The trend to produce more overt pronominal subjects with a preceding overt pronominal subject did not reach significance in the learner groups, possibly due to the high rate of overt pronominal subjects in 3sg when the previous subject is null.

Lastly, for Speaker group, our results were consistent with our hypotheses. L2Ls produced significantly more overt pronominal subjects than NSs in 1sg. Additionally, the L2ers differed from NSs in sensitivity to Switch Reference. In 3sg, both learner groups used significantly more overt pronominal subjects than the NS group although their patterns of use paralleled those of NSs.

#### **5 Conclusion**

This paper further examined SPE in L2 Spanish in contrast to the Spanish of NSs, expanding on previous research by focusing on the interaction between

#### Ana de Prada Pérez & Nick Feroce

variables that have previously received extant attention: Switch Reference and perseveration. The main results reported in this paper were:


For perseveration, we coded for the Form of Previous subject, instead of the form of previous mention, to make sure it was a mechanical factor. Contrasts of these two measures of perseveration should be performed in L2ers to be able to better compare across studies. A priming effect was reported across groups (in contrast with Otheguy 2015). Since our coding and case inclusion largely followed Otheguy's (2015) coding, we attribute this difference to a difference in analysis. While Otheguy (2015) used cross tabulations, our analysis examined the probability that an overt subject was followed by another overt pronominal subject vs. a null subject. Otheguy's (2015) analysis was more sensitive to the difference in overall frequency of null vs. overt pronominal subjects in the sample. Additionally, it collapsed across the category of different previous subjects rather than different subject forms (nulls, lexical, other types of subjects when the subject is overt pronominal in the second verb), which, as our data revealed, behave quite differently from each other. Our results align with Cameron's (1994) data from NSs in that the priming effect is stronger in contexts of same reference than in contexts of different reference. We argue that the examination of the interaction between mechanical and functional factors can better help us understand the acquisition path of SPE in Spanish as an L2. Our data revealed that Spanish L2ers differed from NSs in rates of overt pronominal subjects (particularly in 3sg subjects), as well as, in patterns of use, as exemplified by the functional variable, Switch Reference (particularly in 1sg subjects). Crucially, the interaction between Form of Previous Subject and Switch Reference, revealed that in contexts where those two factors may favor a different form, the effect of the

other variable is obscured. For example, with different reference contexts, the effect of pronominal priming is obscured. Similarly, in contexts where there is a preceding overt pronominal subject, the effect of Switch Reference is weakened. The effect of priming, or its interaction with Switch Reference, was similar across the different groups of speakers. Overall, thus, our data indicated that differences between learners and NSs were restricted to rates and functional variables and were different in 1sg and 3sg subjects. We explain these differences based on functional factors, mainly discursive differences between 1sg, as a deictic person, and 3sg, as a referential person, and the saliency of these features in 3sg vs. 1sg. Our data is limited in scope (only including 1sg and 3sg data, only immediately preceding subject priming, and a limited number of learners as well as a NS group composed of heritage speakers) and, thus, invites further research expanding the persons included in the analysis, contexts of priming, and number and variety of participants. While the choice of the bilingual NS group may obscure comparisons with previous research, we argue that it is a welcome approach to minimize the comparative fallacy in L2 research (Bley-Vroman 1983). We also presented some caveats related to our analysis and results: the low number of participants could have led to the possibility of a Type II error, the reporting of trends that did not reach significance (reporting on marginal significance) can lead to a Type I error, and reporting all pairwise comparisons even though there was no three-way interaction. These choices were made as to provide as much information as possible on the patterns attested in our data. Additionally, this paper contributes to the SLA literature by including data from a novel group of speakers and a new focus on the interaction of mechanical and functional factors and their different roles in SLA.

#### **Acknowledgements**

The second author of this paper was supported by an NIDCD T32 training grant (5T32DC000052).

#### **References**

Abreu, Laurel. 2009. *Spanish subject personal pronoun use by monolinguals, bilinguals, and second language learners*. Gainesville, FL: University of Florida. (Doctoral dissertation).


(eds.), *Selected proceedings of the 2007 Second Language Research Forum*, 69–85. Somerville, MA: Cascadilla Proceedings Project.


## **Chapter 15**

## **Frequency and efficiency in Spanish proverbs**

Ernesto R. Gutiérrez Topete<sup>a</sup>

<sup>a</sup>University of California, Berkeley

Zipf's law states that there is an inverse relationship between a word's length and its frequency; the more frequent words tend to be the shortest. Following this premise, I investigate if this universal property in language can extend to domains beyond the word. As such, the present study analyzes the use of proverbs, a specific type of fixed expressions, in a Spanish corpus of news language. The motivation for this study is to determine if more frequent proverbs are more likely to be shortened, relative to their lower-frequency counterparts. The results of this study indicate that there is a positive correlation between a proverb's general frequency in a corpus and its reduction rate. This paper argues that usage-based models of language representations are better equipped to account for the mechanisms used in the production of proverbs, compared to the traditional view of fixed expressions as enlarged single words.

### **1 Introduction**

For interlocutors, communication is a trade-off between the speaker's production effort and the amount of information that the listener receives. That is, if the speaker provides less information to minimize the amount of effort required to articulate the signal, the listener will be faced with a bigger burden in trying to sort through the ambiguity. In trying to strike a balance, languages tend to reduce the articulation of more frequent words at the segment and syllable levels. For example, in Spanish we find that *grillo* <sup>→</sup> ["gri.o]; *después* <sup>→</sup> [de."pu>es]; and in English we find that *memory* → ["mEm.ôi]; and *information* → ["In.foU] (Lipski 1990, Brown 2008, Bybee 2002b, Mahowald et al. 2013). Such a phenomenon is

encompassed in Zipf's law, which states that there is a negative correlation between a word's frequency and its length in any given corpus of written language or naturally produced speech (Zipf 1936, 1949).

Usage-based models of language representation and use – for instance, as described by Bybee (2001) and Pierrehumbert (2001) – argue that there is a reciprocal relationship between the mental representation of language (i.e. the grammar) and its actual use. In other words, "usage feeds into the creation of grammar just as much as grammar determines the shape of usage" (Bybee 2006: 730). Furthermore, they postulate that language users store in exemplar clouds information related to the frequency and context in which the linguistic item was used, with more recent and/or frequent items being activated and retrieved more easily. Accordingly, research on frequency and context of use of linguistic structures finds that more frequent items are treated differently: more frequent words are learned sooner, recognized more quickly, articulated more easily, and perceived as more grammatical. These effects are seen in linguistic areas such as language change (Phillips 1984, 1999, Bybee 2000, 2001, 2002c), psycholinguistics (Vitevitch & Luce 1998, Vitevitch 2002), language variation (Ellis 2002a,b), language acquisition (Bybee 2002a, Erker 2011, File-Muriel & Brown 2010), and more.

When studying usage-based effects, linguists have primarily focused on the role of the speaker, paying particular attention to the phonetic/phonological level (i.e. sounds and clusters), as Bybee (2002b) has outlined. Other domains such as the intersection of frequency and fixed expressions have not been as rigorously studied. Fixed expressions are of particular interest because of their composition and representation. While fixed expressions have traditionally been regarded as representing a single unit in the mind of a speaker (see Sinclair 1991), the hypothesized exemplar clouds in the usage-based models "may in principle allow speakers to deduce what words are likely to cooccur, with what probability, formulating probabilistic generalizations well beyond the level of the fixed phrase …" (Erker & Guy 2012: 528). Thus, the stored information about language context, use, and co-occurrence of lexical items in collocations such as fixed expressions provides us with linguistic environments where Zipf's law can be tested – beyond the cluster, syllable, and lexical levels.

In this study, I analyze the effect that occurrence frequency has in a particular type of fixed expressions: the proverb. Proverbs are linguistic constructions that express, in a phrase or complete sentence, a specific idea – a metaphor that contains "a good dose of common sense, experience, wisdom, and above all truth" (Mieder 1989: 15). I focus the present study on the use of proverbs in a corpus of the language used in the news media. Admittedly, the language in the news industry may be considered less 'natural language' because of the formulaic crafting

behind it. Nonetheless, it was selected for this study due to the fact that proverbs are very frequent in this genre owing to their economy: "[proverbs] let journalists rapidly explain something new via something known" (McLaughlin, personal communication, June 11, 2020). More precisely, this study investigates if more frequent fixed expressions such as proverbs are more likely to undergo articulatory reduction (e.g. *People who live in glass houses should not throw stones*. → *People who live in glass houses*.), as is the case with other linguistic domains (e.g. clusters, syllables, and lexical items). I hypothesize that more common phrases will experience higher rates of truncation in order to maximize efficiency in production. These results will help us further understand the extent to which Zipf's law applies to fixed expressions in (written) language, a novel linguistic domain.

The remainder of this paper is structured as follows: §2 explains frequency effects and its applicability to proverbs through the lens of usage-based models and Zipf's law. This section ends with a discussion of the use and prevalence of proverbs in the news media. §3 describes the present study, and §4 presents the results. Finally, this paper ends with a discussion of the findings of this study and a conclusion in §5 and §6, respectively.

#### **2 Frequency, proverbs, and the news media**

#### **2.1 Zipf's law**

Zipf's law (Zipf 1936, 1949), though not unchallenged (see for example Bentz & Ferrer-i-Cancho 2016), has long been held as a universal property in human language. In fact, it has been denominated "the most well known statement of quantitative linguistics" (Montemurro 2001: 1). This law states that there is a negative correlation between a word's frequency and its length in any given corpus of written language or naturally produced speech. In his own words: "the magnitude of words tends, on the whole, to stand in an inverse (not necessarily proportionate) relationship to the number of occurrences" (1936: 25). The inverse relationship between a word's length and its relative frequency is robust and has been corroborated cross–linguistically by the countless studies that have examined this phenomenon (e.g. Bates et al. 2003, Ferrer-i-Cancho & Hernández-Fernández 2013, Piantadosi et al. 2011, Strauss et al. 2007, Wimmer et al. 1994, Zipf 1949). Furthermore, this pattern has also been found in other natural and/or non-human systems (e.g. data with variable sequence length, music, computer systems, etc., Aitchison et al. 2016).

The principle of abbreviation in human language, Zipf claimed, stems from a need of communicative efficiency, reducing the effort in the production of the

more common words in language. This observation is the basis for his *principle of least effort* – "the primary principle that governs our entire individual and collective behavior of all sorts, including the behavior of our language" (Zipf 1949: "Preface," paragraph 22). Ferrer-i-Cancho & Hernández-Fernández explain that the burden of communication falls on both the speaker and the hearer, which expands to many levels of the communication process. At the phonological level, the speaker wants to minimize the amount of articulated language, whereas the hearer wants to receive as much information in the signal as possible in order to decode the message (Pinker & Bloom 1990, Köhler 1986). At the lexical level, the hearer wants the most informative word (Köhler 1986, Zipf 1949). Yet, according to Köhler (1986), as cited in Ferrer-i-Cancho & Solé (2003), speakers tend to resort to the most common words to minimize communicative effort when speaking. Subsequently, the most frequent word acquire more uses (i.e., meanings), rendering these words more ambiguous than their less frequent counterparts (Ferrer-i-Cancho & Hernández-Fernández 2013: 788). What is easiest for the speaker yields a more complicated situation for the hearer and vice versa, thus prompting the need for a compromise between the two. Zipf refers to this efficiency trade-off as the principle of least effort.

#### **2.2 Fixed expressions**

Collocations or multi-word expressions are peculiar linguistic structures. On the one hand, they are composed of multiple words, creating whole phrases or even sentences. On the other hand, these chunks, as a whole, display similar frequency effects as lexical items; for instance, children can utter higher frequency multiword phrases faster and "better" (Bannard & Matthews 2008), and adults are able to process these collocations more quickly (Arnon & Snider 2010). These phrases behave similarly to individual lexical items, and for that reason, they have traditionally been regarded as representing a single unit in the mind of a speaker. As Erker & Guy (2012) describe, traditional linguistics referred to fixed phrases or idioms such as *to kick the bucket* as enlarged lexical items, basically working as long words. This idea is notably encompassed in Sinclair's *idiom principle*, which asserts that "a language user has available to him or her a large number of semipre-constructed phrases that constitute single choices, even though they might appear to be analyzable into segments" (Sinclair 1991: 110).

Gramley & Pätzold (1992) took Sinclair's principle further and created a systematic classification of fixed expressions. In their work, the authors categorized collocations, idioms, pragmatic idioms, and proverbs as separate sub-categories of fixed expressions. The authors attributed to the proverb: (a) the capability of

expressing a speech act such as a promise, warning, request, etc.; (b) the nature of constituting a complete sentence; and (c) the characteristic of idiomaticity, where its actual meaning cannot be deduced by the denotation of its individual words (48–9). Nonetheless, Gramley & Pätzold (1992) note that proverbs are not necessarily 'fixed.' They elaborate that "proverb collections often list a number of variant forms, which shows that variability is a characteristic trait of proverbs" (1992: 60). In fact, proverbs are made up of complete sentences but are not always produced as such: "They are so well known that even fragments and mutations are easily associated with the full form," unlike most idioms, "which would become meaningless if changed in this way or allow only a literal reading" (1992: 60–61).

This leads to a conceptual conflict: a proverb has been considered *a single, enlarged word*, but unlike other types of fixed expressions, it can also have several variants – including fragmentations of the full form – while maintaining its meaning and still displaying similar lexical effects as individual lexical items. Now, the question that arises is: how can we conceptualize the mechanisms that allow for similar frequency effects of individual words and proverbs? In other words, if proverbs are treated as single words, should frequency effects of individual words be ignored or discarded? Alternatively, should lexical effects of individual words contained in the proverb be considered or prioritized in analyzing the frequency effects observed in this type of fixed expressions? As discussed below, the postulates regarding contextual and collocational information put forth by usage-based models allow for a representation of proverbs in a more uniform way.

#### **2.3 Frequency in collocations**

Usage-based and exemplar models (e.g. Johnson 1997) postulate that speakers and listeners store vast information about the linguistic items (e.g. sounds and words) they encounter in exemplar clouds (i.e. memory clusters). These exemplar clouds, then, help generalize these items into categories that can be used as production targets during speech articulation. As Erker & Guy (2012) explain, "[these] models further postulate that collocational information is retained in and deducible from memories of linguistic experience" (2012: 258). That is, exemplar clouds of sounds and words may be extended beyond this level, providing higher activation levels for words that co-occur frequently, where the articulation of a portion of a frequent collocation will evoke the remainder of this phrase from the listener's exemplar cloud. Additionally, more than one item may be activated by

a previously activated/uttered item, giving the speaker more than one option to choose from, which may yield multiple variants for a particular phrase.

In their study of the use of subject pronoun + verb collocations in Spanish, Erker & Guy (2012) found that pronouns were omitted at a higher rate when they co-occurred with morphologically irregular and external activity verbs that had a higher general frequency. The opposite effect occurred with morphologically regular and mental activity verbs. From these results, the authors conclude that: (a) frequency does not affect language independently from other factors or constraints, and (b) frequency effects cannot be described simply as unidirectional.

#### **2.4 Proverbs in news language**

As described earlier, news language is not the quintessence of naturally produced language, but it provides us with an open field to explore the usage of proverbs. The language used in the news media industry has been characterized as a crafted language that employs certain devices or tools in a stylized manner (Mouriquand 1997, Conboy 2013). Among these tools we find anecdotes and proverbs, which are used with a similar purpose in mind: "[l]'objectif [des anecdotes] est à la fois de raconter une très courte histoire qui rende la lecture agréable et de donner à comprendre ce qui, conceptualisé, serait trop complexe" ('The purpose [of anecdotes] is both to tell a very short story that makes reading enjoyable, and to provide an understanding of what, conceptually, would be too complex') (Mouriquand 1997: 97). Similarly, "[l]e recours au proverbe, à la formule populaire présente l'intérêt de faire référence à un sens tout à fait clair dans l'esprit du lecteur et ainsi de faciliter la compréhension dans un passage difficile" ('The popular use of the proverb offers the consideration of referring to a meaning completely clear in the reader's mind, thereby facilitating comprehension of a difficult passage') (1997: 102).

The study of news language has traditionally focused on or, at the very least, started with the analysis of written newspapers. However, Conboy (2013) reports that news delivered in other types of media is often influenced by the style used in written newspapers. As such, all journalistic language, regardless of mode of delivery, may be attributed to the same genre, including spoken news on TV. Thus, the careful formulation of news language renders it non-natural, distinct from spontaneous speech. However, it is on a par with other forms of written language, such as books and magazines, two sources of written language to which Zipf's law also applies. Therefore, we can and should expect that this genre will

render a form of language that resembles written language and that, as indicated above, contains proverb data that *is ripe for the picking*.

To my knowledge, proverbs in the Spanish news media have not been broadly studied from a linguistics perspective. However, it is well known that some of the most typical tools used in this industry, including the ones described by Mouriquand (1997), are widely used in this genre cross-linguistically. Thus, the present study works under the assumption that the characteristics about the news media described here will also apply to the news media industries in the Spanishspeaking world.

#### **3 The present study**

Although linguistic reduction has been well documented in many (non)linguistic domains and countless forms of communication, certain domains such as fixed expressions – and more specifically, proverbs – remain largely unexplored. The present study investigates whether or not more frequent proverbs are more likely to undergo linguistic reduction. Adhering to usage-based models and exemplar theory, I work under the premise that instances of linguistic items to which a speaker-listener is exposed are stored in the mind, in conjunction with all their relevant use and context information. As such, I assume that more frequent proverbs will benefit from the frequency effects observed in other domains such as lexical items; that is, they will be more easily recalled and/or evoked. Said differently, I hypothesize that proverbs will be governed by Zipf's law, demonstrating higher rates of shortening for more frequent items. Note that I will use the terms "reduction", "shortening", and "truncation" interchangeably in this paper.

#### **3.1 Materials**

The data used for this study came from the News on the Web (NOW) corpus, from the Corpus del Español, which at the time of data collection included over 7.2 billion words in Spanish (Davies 2016). The corpus is composed of web-based newspapers, magazines, and their respective comments sections, if available, thus rendering more formal (i.e. written news content) and less formal (i.e. naturallyproduced user-comments) sources of language. The data was collected from 2012– present, and the corpus covers the press in all Spanish-speaking countries, including the United States (US). The news media was selected because of its high use of proverbs, as mentioned above, and because its parsimonious nature provides

an environment where linguistic reduction is commonplace, compared to other corpora of written language. Additionally, the NOW corpus was selected because it provides a large corpus of data that spanned across Latin America, Spain, and the US.

A list of 30 target proverbs was selected from Pedicone de Parellada (2004), a collection of sayings commonly used in the press in Tucumán, Argentina (see Appendix A). I followed the definition of proverbs provided in Gramley & Pätzold (1992), where proverbs are described as expressions that (a) sum up to situations and provide advice or a moral, (b) are metaphorical, and (c) may have variable forms. The proverbs were stratified by frequency. Each proverb was labeled as either high frequency or low frequency, using a median split approach, based on their general frequency in the NOW corpus. The 30 expressions analyzed in this study are those from Pedicone de Parellada (2004), which had a frequency of at least 1 – in their long form – in the NOW corpus. It was important for the target proverbs to exist in their long form so that the process of shortening could be definitively demonstrated. Pedicone de Parellada's book was used to compile the materials of interest because it provided a list of proverbs that have been attested as commonplace in the contemporary Spanish-speaking media,<sup>1</sup> which aligned with the timeframe for the data collected in the NOW corpus. The present study, therefore, provides a synchronic observation of the behavior of proverbs in the news media.

#### **3.2 Procedures**

As the NOW corpus only allows for an interaction with the data through its interface, the data collection and analysis were conducted manually. In addition to the aforementioned criterion of one or more long-form occurrences in the corpus, the proverbs needed to be composed of two or more syntactic phrases (e.g. NP, VP, PP, etc.), as this facilitated the operationalization of proverb length, where shortening was considered present if at least one whole syntactic phrase had been omitted. Each occurrence of a proverb in the corpus was documented as an observation, and for each observation, the dependent variable was recorded as a decision between the binary shortened or not shortened outcome, according to the form in which the token appeared.

<sup>1</sup>Although this collection of proverbs comes from the Argentinian press, all proverbs are present in other varieties of Spanish; in fact, being a speaker of Mexican Spanish, I have a high level of familiarity with most of these proverbs. For that reason, these proverbs collected from the Argentinian press were analyzed in the NOW corpus, a multi-dialectal Spanish corpus.

Deletion, *not substitution*, was required for shortening to be counted as present. An example of reduction is a sentence with a PP and an NP which has the second phrase truncated, such as [*a buen entendedor* PP][*pocas palabras* NP]→[*a buen entendedor* PP] 'a word to the wise'. Tokens in the dataset that had a syntactic phrase substitution or variants that differed from the standard form presented in Pedicone de Parellada (2004) were counted as long-form tokens (i.e. not shortened). For instance, [*a buen entendedor* PP] [*sobran explicaciones* VP] was considered a substitution, and thus not shortened. Moreover, *agua que no has de beber, déjala pasar* was considered a variant of *agua que no has de beber, déjala correr* 'don't be the dog in the manger', and thus not shortened. To ensure that all possible variants of a given proverb were included in the analysis, all its content words were searched in the corpus, and all data that had at least one complete syntactic phrase belonging to the proverb were collected.

#### **3.3 Analysis**

A total of 2,997 proverb occurrences were recorded in the corpus. Repeated tokens from articles that were published in multiple outlets (i.e. re-published articles) and tokens that made references to creative works (e.g. song or film titles) were excluded from the analysis. An analysis of frequency type showed that high frequency tokens had an average general frequency of 187.3 occurrences (SD=123.4) and a range of 54–503 occurrences, while lower frequency proverbs had a mean of 12.5 (SD=17.5), with a range of 1–51 occurrences. A *t*-test reported that the low and high frequency groups were significantly different from one another in regard to corpus occurrence (*df* = 28, *p* < 0.001). Furthermore, general corpus frequency and shortening frequency/rate per token type were calculated; the latter measurement refers to the rate of shortened over total number of occurrences for each proverb. A fixed-effects logistic regression model analyzed the dependent variable (i.e. whether an observation was shortened or not) as a function of general frequency in R (R Core Team 2018) using the glm() function. No other independent variables were analyzed in the statistical model.

#### **4 Results**

Across the board, proverbs in their long form were more common than shortened tokens: 2,388 to 609. Furthermore, the ratio for shortening ranged from 0% to 44.4%, indicating that, for any given proverb, there were more tokens it is long form that there were shortened tokens. Figure 1 shows the rate of shortened tokens per proverb for each of the 30 proverbs analyzed in this study over their general frequency in the NOW corpus.

Figure 1: Rate of shortened tokens per proverb, given its general frequency in the NOW corpus. This graph includes the regression line.

The results of the logistic regression model, shown in Table 1, indicate that general frequency is a strong predictor of shortening rate (*p* < 0.001). The conversion of the model's log-odds coefficients into decimal values reveals the model's prediction that, at frequency 1, proverbs will display a reduction rate of 11.9%. Furthermore, as frequency increases by one unit, the probability of reduction is estimated to increase by 0.025%. That is, the model predicts that for a proverb that has a general frequency of 503 occurrences – the highest general frequency for the proverbs in this dataset – the shortening rate will increase to approximately 31.2%. These predictions are visualized in Figure 2, which displays the probability of reduction given proverb frequency.


Table 1: Logistic regression model of proverb shortening

Figure 2: Probability of reduction over frequency increase for all 30 proverbs.

In sum, these results indicate that shortening shows a strong effect of frequency, adhering to Zipf's law. However, there were two particular outliers:


The former had a general frequency of 15 occurrences with a 40% reduction rate, whereas the latter had 45 occurrences and a reduction of 44.4%. While these proverbs were labeled as low frequency, they were in the top 3 most truncated proverbs. These outliers, which had an even higher shortening rate than the one reported for the most frequent proverbs, signal that frequency alone is not sufficient to account for this phenomenon. The first potential explanation for this peculiarity makes reference to syntactic length or complexity. At first glance, nonetheless, this explanation fails to hold water. The two outliers, repeated below with delineated syntactic structures, have comparable reduction rates but starkly distinct syntactic complexity.


The proverb in (3), evidently, is composed of a higher number of syntactic phrases relative to (4). Yet, it is also true that both proverbs contain complex structures (e.g. [*como la gata flora* PP] for the former and [*agua que no has de beber* NP] for the latter). It is possible that their more complex structures – even

if only applicable to some of its phrases – may incentivize language users to truncate this phrase in order to reduce the production effort, be it speaking or writing. This would clearly fit within the expectations put forth by Zipf's law. As such, a post-hoc analysis that included (a) the total number of syntactic phrases and (b) the number of complex syntactic phrases per proverb was performed to identify those factors as potential indicators of reduction rate as a way to account for these two outliers. A statistical model that included the number of syntactic phrases and the number of complex syntactic phrases as non-interacting independent variables, shown in Table 2, found that these factors were not significant: (*p* = 0.82 and *p* = 0.08, respectively). The inclusion of these independent variables into the model does not change the effect observed for general frequency (*p* < 0.001). The next section provides a discussion of these results and elaborates on the factors considered in this study as potential predictors of proverb shortening rates.

Table 2: Logistic regression model of syntactic composition


#### **5 Discussion**

To summarize the results in this study, proverbs appear to display higher rates of reduction when their general frequency increases in the NOW corpus. Therefore, proverbs seem to be governed by similar frequency effects that have been found in lexical items and other linguistic domains. This link suggests that, as proverbs become more frequent and thus easier to recall, speakers are more likely to shorten the production of these phrases. All in all, these results are a robust corroboration of Zipf's law, indicating that more frequent proverbs are more likely to be reduced.

The post-hoc statistical analysis presented above also reports that the number and complexity of syntactic phrases in these proverbs do not predict truncation. These results are corroborated by a further inspection of the rest of the data.

For instance, consider (5) and (6). The proverb in (5) had a general frequency of 286 occurrences with a shortening rate of 20.6%, while the proverb in (6) had 5 tokens with 0 shortened variants. Considering that these two sentences have a remarkably similar syntactic structure, if syntactic complexity was correlated with shortening rate, it would be expected that they display comparable reduction rates, but this is not the case. The pattern seen between (3) and (6) suggests that syntactic complexity does not favor a particular outcome when it comes to linguistic reduction, or at least not forthrightly.


Considering that syntactic complexity does not appear to be directly correlated with linguistic shortening of proverbs, an additional analysis of variability seemed justified. Accordingly, I qualitatively analyzed the numbers of variants that were present for each of the thirty tokens in the dataset, starting with (3)– (6). However, from the onset, no clear pattern was detected. For instance, (3), a low frequency proverb with a high shortening rate, displayed a high degree of variability. The 15 occurrences found in the corpus were made up of a total of 8 variants (see some of these variants under (7)) other than the standard form shown in (Pedicone de Parellada 2004). Conversely, (5), a proverb with 286 tokens, only had 12 recorded variants.

(7) a. Como Like la the gata cat flora: flora: si if se to-her la it ponen they-give gritan they-scream y and si if se to-her la it sacan they-take-away lloran. they-cry There's no pleasing [someone]


While the rate of variability may appear to explain the outlier status of (3) and the compliance of (5), it fails to account for (6), which only had 5 tokens and 0% shortening; this example was composed of a total of three variants – more than 50% divergence from the standard form. Conversely, (4), with 45 occurrences and nearly 45% shortening, has only two variants present in the dataset. The shortening rates of (3) and (6), two low frequency proverbs, are in contrast with each other.

All in all, this study demonstrates that there is an efficiency trend in proverb production. The fact that this trend materializes at all in language use beyond the lexical level – i.e. in fixed expressions – signals that: (a) Zipf's law can be applied to linguistic items beyond the lexical item; (b) speakers and listeners can negotiate a balance between production effort and informativeness trade-off in proverbs, avoiding ambiguity; and (c) the language from the news media, albeit "not natural", may be used to study natural linguistic phenomena and general properties of human language, provided that this genre allow for the expression of these phenomena. Additionally, the present study also sheds lights on the traditional view of fixed expressions as enlarged lexical items, suggesting that usage-based models are better able to account for the frequency effects observed here.

First, these results are indicative of proverbs' adherence to Zipf's law in written news language. Zipf's law originated from an observation that the most common words in language also tend to be shorter. In fact, the more common the word, the shorter it will be. The general concept is applicable to proverbs; the more common proverbs were not shorter, *per se*, but they were shortened at a higher rate. This is suggestive of a general human tendency to operate efficiently in a way that extends beyond the phonological and lexical domains. Furthermore, the most common words tend to be so because they have the most uses, as indicated earlier. A similar parallel may be drawn between words and proverbs in

this respect. The most common proverbs may be applicable to more conversational/social contexts; the more contexts it applies to (i.e. the more uses it has), the greater the need for efficiency.

The discussion of efficiency mentioned above leads us to the second observation made from the results of the present study: speakers are able to negotiate form length and meaning of proverbs while avoiding ambiguity. An advantage of proverbs is the metaphorical meaning that they have in addition to the literal one. This is an advantage that other fixed expressions such as idioms do not have. For example, "cut Barbara some slack" is not the same as "cut Barbara" or "Barbara some slack," where the former will have an idiomatic meaning, while the latter two lose their intended meaning; one only has a literal meaning, whereas the other becomes meaningless. In addition to maintaining their meaning, proverbs have a second advantage: intrinsic variability. Regardless of the number of variants or their rate if truncation, proverbs evoke a clear concept in the mind of the speaker-listener. As such, if one of multiple possible variants is produced, the listener (or reader) will arrive at the same concept or meaning being evoked. This last point should not be equated to *the number of variants will predict shortening*, because the results from this study indicate otherwise. Rather, the argument that I am making here is that the variable nature of proverbs provides flexibility in their expression by signifying to the speaker-listener that they are susceptible to being expressed with alternative forms – other than the standard, long form. The result of such an advantage is that speakers (or writers) are able to employ efficient ways of production (i.e. reduced forms), knowing that conceptual meaning remains intact for these proverbs.

Third, these results demonstrated that the language used in the news adheres to Zipf's law; put differently, more frequent proverbs in the news display higher reduction rates, in a way that adheres to Zipf's law. Although the news language genre is perceived as non-natural, it nonetheless follows efficiency conventions seen in other genres of written language as well as natural speech. Considering the parsimonious nature of language use in this industry, the tendency toward linguistic efficiency is not a surprise for the news media, especially if such a behavior also allows for a clear conveyance of specialized topics to the general population.

On the whole, the discussion of these results provides us with a better understanding of the mechanisms used for the shortening of proverbs. As described earlier, usage-based models of language representation argue that language use will impact the mental representation of language (i.e. the grammar). These models provide a better way to capture the frequency effects observed in this study, when compared to the more traditional depiction of proverbs. For instance, the

variable nature of proverbs itself contradicts the notion that proverbs can be observed as "enlarged lexical items." While words may have their standard long form and a potential shortened version (e.g. *hippopotamus* → *hippo*), proverbs may have multiple long-form variants, in addition to multiple shortened versions (refer back to (7) for potential variants of the proverb '[*Como la gata flora*] [*cuando le ponen*] [*grita*][*cuando le sacan*] [*llora*]'). The inherent variability of proverbs makes them distinct from lexical items. Furthermore, the question of frequency becomes complicated when considering proverbs as a single word. For instance, it is not utterly clear if variants of a single proverb should be counted as separate words. If so, the next issue that arises is which variant to attribute a reduced token to.

Conversely, usage-based models consider these fixed expressions as strings of words linked by probabilistic generalizations. Exemplars of proverbs will be stored in the mind, each word linked to the next by probabilistic co-occurrence due to the contextual information stored in conjunction with each lexical item. As such, the activation and retrieval of one word in the string will activate the following word. The fact that one word may activate more than one lexical item accounts for the multi-variant proverbs, with higher frequency proverbs having an activation advantage compared to their lower-frequency counterparts. Lastly, the exposure to a shortened form of a proverb will thus activate the remainder of the phrase. This way, hearing or reading a shortened version of a proverb will still allow the listener/reader to retrieve the original conceptual meaning associated with the long-form proverb. Accordingly, usage-based models are able to account for the empirical observations surrounding proverb shortening trends in written news language.

#### **6 Conclusion**

All things considered, there is a clear relationship between general frequency and truncation rates in the production of proverbs in language in the news media. However, the presence of outliers indicates that general frequency is not the only factor that constrains this relationship. Syntactic complexity and variability as potential factors constraining truncation rates do not appear to be able to account for the shortening rates in these data. It is evident that, as reported in Erker & Guy (2012), frequency and shortening in collocations are connected but not independently of other constraints, and some of the constraints employed by some factors, such as variability, may sometimes manifest in more than one direction.

In sum, this study demonstrates that there is a clear correlation between the general frequency of a proverb in a given corpus and its reduction rate. These results corroborate Zipf's law in yet another linguistic domain: proverbs. Remarkably, this phenomenon seems to be very productive in proverbs, considering that altering the proverb's form allows for shortened proverbs to convey their original intended message. Lastly, the characteristics of proverbs discussed by the literature and the observations mentioned in the present study allow us to examine the effectiveness of usage-based models in accounting for the findings, compared to the traditional view of fixed expressions as single words.

While the corpus used in this study was large, the number of proverbs analyzed was small. As such, it became a challenge to identify other possible constraints that my be acting upon the proverbs. Future studies with more exhaustive proverb lists may be able to find stronger connections between the data and factors that may be influencing their production, in addition to general frequency.

#### **Acknowledgements**

First and foremost, I want to acknowledge Dr. Justin Davidson and Dr. Terry Regier for their help in developing and refining this research project. Also, I am grateful to Annie Helms for her initial help in conceptualizing this research idea and her extensive feedback on the study and this manuscript. Furthermore, I want to thank Madeline Bossi for her help with the syntactic analysis of these data. Finally, I want to show appreciation to the two anonymous reviewers and all the people who provided comments and feedback on previous presentations of this study to the Romance Linguistics community at UC Berkeley and to the LSRL 50 audience.

#### **Appendix A List of proverbs**


### **References**


Zipf, George Kingsley. 1936. *The psychobiology of language*. London: Routledge.

Zipf, George Kingsley. 1949. *Human behavior and the Principle of Least Effort*. New York: Addison-Wesley Press.

### **Name index**

Abreu, Laurel, 301–305, 307 Adams, Marianne, 55 Aguilar, Lourdes, 270, 281, 288 Aissen, Judith, 157 Aitchison, Laurence, 325 Akmajian, Adrian, 182 Albizu, Pablo, 159, 160 Alcover, Antoni Maria, 255 Amengual, Mark, 252 Anagnostopoulou, Elena, 94, 159, 160 Arias, Laura, 88, 89, 91, 100 Arnal, Antoni, 249, 250 Arnon, Inbal, 326 Aronoff, Mark, 44 Artstein, Ron, 303 Astruc, Lluïsa, 278 Auger, Julie, 22, 23, 27, 28, 31, 37 Babel, Molly, 224 Badan, Linda, 269, 270, 272, 276, 278, 287, 288 Bailey, Guy, 252 Bailey, Laura R, 120 Baker, Wendy, 229, 230, 251 Baltin, Mark, 136 Bannard, Colin, 326 Barbosa, Pilar, 131 Bates, Douglas, 204 Bates, Elizabeth, 325 Beckman, Mary, 270 Beckman, Mary E., 270

Béjar, Susana, 159, 160 Belletti, Adriana, 96, 98, 170, 192, 272 Benincà, Paola, 44 Bentivoglio, Paola, 44 Bentz, Christian, 325 Bhatt, Rajesh, 189 Bianchi, Valentina, 98, 272 Biberauer, Theresa, 87, 97 Birdsong, David, 216, 225, 252, 254, 255, 279 Bleam, Tonia, 158 Bley-Vroman, Robert, 307, 317 Bloom, Paul, 326 Boeckx, Cedric, 97, 99, 107, 108, 110 Boersma, Paul, 204, 230, 255, 281 Bohn, Ocke-Schwen, 250–252, 262 Bonet, Eulàlia, 159 Borer, Hagit, 132 Bosch, Laura, 250, 252 Bosque, Ignacio, 94 Bossong, Georg, 157 Bourhis, Richard Y., 222 Boutet, Josiane, 28 Brewer, Marilynn B., 223 Bridwell, Keiko, 196, 197, 199, 201, 202, 204, 205, 214, 215 Brocardo, Maria Teresa, 46, 55 Broekhuis, Hans, 99 Brown, Earl K., 324 Brown, Earl Kjar, 323 Bullock, Barbara E., 199, 201, 203, 215, 227

Büring, Daniel, 135, 149, 150 Burnett, Heather, 22 Burzio, Luigi, 92 Buthers, Christiane Miranda, 47 Bybee, Joan, 323, 324 Cameron, Richard, 299, 301, 302, 305, 315, 316 Canadian Parents for French, 225 Candea, Maria, 196, 198, 203 Carbonell, Joam, 250 Cardinaletti, Anna, 54, 123 Carignan, Christopher, 7–9, 11, 12 Carvalho, Ana M., 301 Casagrande, Jean, viii Cecchetto, Carlo, 183, 185 Chappell, Whitney, 263, 264 Cheng, Lisa Lai-Shen, 289 Chernova, Ekaterina, 270, 277 Chomsky, Noam, 41, 51, 54, 87, 96, 97, 99, 113, 114, 123, 131, 146, 271, 272 Chung, Hye-Yoon, 271, 277, 289 Cinque, Guglielmo, 88, 94, 99 Ciucivara, Oana, 170 Clopper, Cynthia G., 227 Colantoni, Laura, 226, 227 Colomé, Àngels, 252 Conboy, Martin, 328 Contreras, Heles, 277 Corbalán, María-Inés, 131–133 Cornilescu, Alexandra, 157, 162, 163, 165, 168, 172 Cortés, Susana, 250, 253 Costa, Albert, 252 Costello, Anna B., 198 Coveney, Aidan, 24–26 Creissels, Denis, 64, 65, 78

Crocco, Claudia, 269, 270, 272, 276, 278, 287, 288 Cruschina, Silvio, 272, 274 Cyrino, Sonia M. L., 42 D'Aisy, Jean, 4 D'Alessandro, Roberta, 85–87, 92– 94, 98, 99, 101 D'Imperio, Mariapaola, 229 D'Imperio, Mariapaola, 227 Dagnac, Anne, 137 Dajko, Nathalie, 26 Dalola, Amanda, 196–199, 201–205, 214–216 Dauphinais Civitello, Ashlee, 133 Davidson, Justin, 250, 263 Davies, Mark, xvii, 89, 329 Dawson, Aidan, 25 De Benito, Carlota, 89 De la Mota, Carme, 276–278 Debrie, René, 25 Delais-Roussarie, Elisabeth, 230 DeLancey, Scott, 271, 272 DeMello, George, 90 den Dikken, Marcel, 183 Déprez, Viviane, 289 Diaconescu, Rodica, 168 Dickinson, Connie, 272 Domínguez, Laura, 277 Dörnyei, Zoltán, 222 Duarte, Fábio Bonfim, 47 Duarte, Maria Eugênia Lamoglia, 42, 43, 47–49, 51, 53–57 Duguine, Maia, 132–135, 142, 144, 145, 279 Dunbar, Robin I. M., 224 Eckert, Penelope, 216, 225, 240

Edmont, Edmond, 25

Ehala, Martin, 224 Ellis, Nick C., 324 Elordieta, Gorka, 277, 279, 287 Éloy, Jean-Michel, 22 Epstein, Samuel D., 85, 87, 96, 99 Erker, Daniel, 324, 326–328, 339 Escandell Vidal, María Victoria, 270, 274, 277 Escandell-Vidal, Victoria, 170 Espinosa, Aina, 250 Estebas-Vilaplana, Eva, 270, 277, 278, 288 Etxepare, Ricardo, 98, 269, 279, 287 Evans, Bronwen G., 251 Fábregas, Antonio, 101 Face, Timothy, 271, 276, 277, 281, 289 Fagyal, Zsuzsanna, 10, 14, 16, 17, 196– 198 Faraci, Robert Angelo, 114 Fernández-Ordóñez, Inés, 90, 95 Fernandez-Serrano, Irene, 88, 89, 91, 100 Feroce, Nick, 300, 303–305 Ferrer-i-Cancho, Ramon, 325, 326 Figueiredo Silva, Maria Cristina, 48 File-Muriel, Richard J., 324 Flege, James E., 202, 215, 250–252, 262 Flikeid, Karin, 25 Flores-Ferrán, Nydia, 302, 307 Flutre, Louis-Fernand, 25 Fodor, Jerry A., 134 Fónagy, Ivan, 5, 6, 196–198 Fonseca-Greber, Bonnibeth Beale, 25, 26 Forrest, Karen, 200 Fougeron, Cécile, 226 Fox, Danny, 146

Fox, John, 205, 257 Franco, Ludovico, 157 Freire, Gilson Costa, 51 Fridland, Valerie, 251 Gabriel, Christoph, 277 Galindo i Solé, Mireia, 249, 250 Gallego, Ángel J., 88, 98, 99, 107, 108, 114, 133 Galves, Charlotte, 55 Geeslin, Kimberly, 301, 303, 305, 314 Geeslin, Kimberly L., 300, 303, 304 Gendron, Jean-Denis, 6 Genesee, Fred, 225, 228 Georgi, Doreen, 87, 99 Giannakidou, Anastasia, 175 Giles, Howard, 224 Gili Fivela, Barbara, 272, 278 Glasbergen-Plas, Aliza, 289 Gómez Soler, Inmaculada, 308 Gómez, Kryzzya, 136, 139 González, Carolina, 275 González, Raquel, 132, 133 González, Carolina, 270, 275, 277 Gramley, Stephan, 326, 327, 330 Green, Jeffrey J., 114 Green, Jeffrey Jack, 110, 111, 122 Groenendijk, Jeroen, 184 Grosjean, François, 251 Gryllia, Stella, 289 Guasch, Marc, 255 Gudmestad, Aarnes, 300, 303, 304 Gussenhoven, Carlos, 271 Gutiérrez-Bravo, Rodrigo, 101, 271, 272, 276 Gutiérrez-Reixach, Javier, 94 Guy, Gregory R., 324, 326–328, 339 Hall-Lew, Lauren, 224, 240, 256

Hansen, Anita Berk, 5 Hayes, Bruce, 226 Heim, Irene, 135, 189 Heise, David R., 225 Herbeck, Peter, 132, 133 Hernández-Fernández, Antoni, 325, 326 Hestvik, Arild, 150 Higgins, Francis Roger, 181, 182 Hill, Virginia, 157, 173 Hillenbrand, James, 202, 215 Hiraiwa, Ken, 93, 94, 97 Hirschberg, Julia, 226 Hirschberg, Julia Bell, 224 Hogg, Michael A., 223, 224 Höhn, Georg F. K., 92 Hole, Daniel, 169 Holmberg, Anders, 42, 54, 55, 87, 93, 97 Hornstein, Norbert, 107, 110, 133–136, 142, 153 Horvath, Julia, 270, 274 Hrkal, Édouard, 23, 25 Hróarsdóttir, Thorbjörg, 97 Hualde, José Ignacio, 255, 270, 276, 278, 288 Hualde, José Ignacio with Sonia Colina, 276, 277 Huttenlauch, Clara, 278 Introno, F. D., 44 Irurtzun, Aritz, 279 Iverson, Paul, 251 Jackendoff, Ray, 272 Jacobson, Pauline, 183, 185 Jaeger, Byron, 257 Janic, Katarzyna, 64, 65, 78 Janke, Vikki, 120

Jenkins, Ricahrd, 225 Jiménez, María Luisa, 269, 274, 284 Jiménez-Fernández, Ángel, 272–274 Johnson, Keith, 250, 256, 327 Jones, Charles, 114 Jun, Sun-Ah, 226 Kanwit, Matthew, 264 Kato, Mary A., 42, 44, 46, 47, 54, 55, 57, 278 Kaufman, Terrence, 250 Kayne, Richard, viii Kendall, Tyler, 251 Kim, Sowoon, 145, 146 King, Ruth, 24–26, 28, 29 Köhler, Reinhard, 326 Kramer, Ruth, 114 Kratzer, Angelika, 135 Kroch, Anthony, 44 Kuznetsova, Alexandra, 204, 256 Laberge, Suzanne, 24, 25 Labov, William, 224, 252, 253, 263 Ladd, D. Robert, 270, 271 Ladefoged, Peter F., 250 Landau, Idan, xiii, 107–111, 114, 116– 119, 122, 125, 136, 168 Lasnik, Howard, 54, 131 Lawson, Robert, 224, 240 Leach, Colin Wayne, 222, 229, 240 Ledieu, Alcius, 25 Lenth, Russell V., 257 Leonardelli, Geoffrey J., 223 Leonetti, Manuel, 85 Lepetit, Daniel, 226, 227 Lev-Ari, Shiri, 251 Levitan, Rivka, 224 Levshina, Natalia, 205 Lightfoot, David, 50

Lindblom, Björn, 197, 215 Lindsay, Paul, 6 Linford, Bret, 303, 304 Linford, Bret G., 304 Lipski, John, 132 Lipski, John M., 323 Livitz, Inna, 131, 132, 134, 140 Livitz, Inna G., 134, 140 Llisterri, Joaquim, 250 Loccioni, Nicoletta, 186, 188 Lodge, Anthony, 24 Long, Avizia Yim, 303, 304 Lopes, Célia Regina dos Santos, 46, 55 López, Luis, 85, 86, 92–94, 96, 99, 126, 157, 169–171, 272 Lozano, Cristóbal, 300, 315 Luce, Paul A., 324 Lüdecke, Daniel, 205 Lyster, Roy, 227, 228 MacDonald, Jonathan E., 87, 98 Machuca Ayuso, María, 272, 278 MacKinnon, Neil J., 225 Magalhães, Telma Moreira Vianna, 50, 51 Mahowald, Kyle, 323 Malécot, André, 6 Mange, Jessica, 224 Manzini, M. Rita, 157 Mardale, Alexandru, 157, 173 Martin, Pierre, 196, 197 Masullo, Pascual José, 65 Matthews, Danielle, 326 McAuliffe, Michael, 255 McCafferty, Kevin, 224, 240 McCarthy, Corrine, 17 McFadden, Thomas, 132 McLeod, Julie, 225

Medová, Lucie, 65, 75 Melis, Ludo, 65 Mendikoetxea, Amaya, 85–87, 89, 90, 95, 96, 99 Mensching, Guido, 131–134, 140 Merchant, Jason, 192 Michelas, Amandine, 229 Mieder, Wolfgang, 324 Miller, Gary, 131 Miozzo, Michele, 252 Modesto, Marcello, 48 Moisset, Christine, 196–198 Moll, Francesc de Borja, 255 Montemurro, Marcelo A., 325 Mora, Joan C, 253, 263 Morgan, John, 255 Morin, Yves-Charles, 4 Mouriquand, Jacques, 328, 329 Munson, Benjamin, 200, 213, 216 Nadeu, Marianna, 253, 263 Nardy, Aurélie, 224, 240 Netelenbos, Nicole, 225 Nevins, Andrew, 114 Newman, Michael, 263 Nicholas, Jessica, 8–11, 13–16 Nielsen, Kuniko, 224 Nikbakhtt, Masoomeh, 205 Nishida, Chiyo, 65, 75 Nunes, Jairo Morais, 46 Nycz, Jennifer, 256 O'Rourke, Erin, 277 Obata, Miki, 85, 87, 96, 99 Oku, Satoshi, 142, 145, 146 Onea, Edgar, 169 Ordóñez, Francisco, 93–95, 98 Ormazabal, Javier, 86, 88, 94, 96, 157– Ortega-Llebaria, Marta, 277 Ortega-Santos, Iván, 85 Ortega-Santos, Iván, 273 Ortiz de Urbina, Jon, 279, 287 Ortiz López, Luis A., 133 Osborne, Jason W., 198 Otero, Carlos P., 94 Otheguy, Ricardo, 301, 302, 307, 316 Pamies-Bertrán, Antonio, 281 Paternostro, Roberto, 198, 203 Pätzold, Kurt-Michael, 326, 327, 330 Pedicone de Parellada, Elena Florencia, 330, 331, 336 Peperkamp, Sharon, 251 Perez Tattam, Rocío Simone, 132 Perlmutter, David, 159 Perlmutter, David M., 54 Péronnet, Louise, 25 Perreault, Stéphane, 222 Perry, Pamela, 225 Pescarini, Diego, 92, 93 Pesetsky, David, 110 Phillips, Betty S., 324 Piantadosi, Steven T., 325 Piera, Carlos, 132–134, 140 Pierrehumbert, Janet B., 226, 270, 324 Pierrehumbert, Janet Breckenridge, 270 Pilati, Eloisa Nascimento Silva, 47 Pinker, Steven, 326 Polinsky, Maria, 64, 65, 76 Poljak, Livia, 228, 233, 240 Pooley, Tim, 27 Pope, Emily, 277 Posner, Rebecca, 2, 5, 7 Prada Pérez, Ana de, 300, 303–305, 308

Preminger, Omer, 92, 97, 160 Prieto, Pilar, 270, 276–279, 281, 288 Pujalte, Mercedes, 88, 94 Qualtrics, 255 Ramon-Casas, Marta, 250, 252 Rao, Rajiv, 279 Raposo, Eduardo, 86, 88, 94, 98 Reay, Diane, 222 Recasens, Daniel, 250 Reglero, Lara, 269, 270, 274, 275, 277, 279, 287 Rehner, Katherine, 28 Reicher, Stephen David, 223 Repp, Sophie, 288 Rezac, Milan, 159, 160 Richards, Marc, 168 Richards, Marc D., 87, 97 Richards, Norvin Waldemar, 114, 123 Rico, Pablo, 133 Riebold, John, 256 Rigau, Gemma, 98, 131, 132 Ríos, Antonio, 272, 278 Rivero, María Luisa, 93, 127 Rizzi, Luigi, 41, 54, 110, 131, 146, 192, 272 Roberts, Gareth, 224 Roberts, Ian, 54, 55, 123 Rochemont, Michael, 270, 274 Rodríguez-Mondoñedo, John, 157, 168, 170 Rodríguez-Mondoñedo, Miguel, 85, 126 Romera, Magdalena, 279 Romero, Juan, 86, 88, 94, 96, 157–161, 164 Romero, Maribel, xiv, 183, 184 Rooryck, Johan, 289

Roseano, Paolo, 270, 279 Rosin, Lena, 288 Ross, John Robert, 183 Rosselló, Joana, 98 Rottet, Kevin J., 26 Ruhlen, Merritt, 4 Saab, Andrés, 88, 94, 137 Saciuk, Bohdan, viii Saito, Mamoru, 142, 145, 146 Salvi, Giampaolo, 44 Sampson, Rodney, 1, 2, 5 Sánchez López, Cristina, 86–89, 99 Sankoff, Gillian, 224 Sansò, Andrea, 64, 65, 78 Schlenker, Philippe, xiv, 183, 184, 191 Schulte, Kim, 132 Schütze, Carson, 133 Sharvit, Yael, 183, 185, 189 Sheehan, Michelle, 42 Shin, Naomi Lapidus, 303, 304 Sigurðsson, Halldór Ármann, 87, 92, 93, 97 Simões, Luciene, 50 Simonet, Miquel, 250 Sinclair, John M., 324, 326 Smirnova, Liudmila, 25 Smith, Caroline L., 196–198 Snider, Neal, 326 Sobin, Nicholas, 277 Solé, Ricard V, 326 Sorace, Antonella, 75 Spears, Russell, 222 Sportiche, Dominique, 149 Starke, Michal, 54 Stegovec, Adrian, 170 Stokhof, Martin, 184 Strauss, Udo, 325 Sunara, Simona, 226

Sundaresan, Sandhya, 132 Suñer, Margarita, 123, 133 Swarup, Samarth, 17 Szabolcsi, Anna, 131, 132, 134 T'hart, Johan, 281 Tagliamonte, Sali A., 254 Tajfel, Henri, 222, 223 Takahashi, Daiko, 142, 145, 146 Takahashi, Shoichi, 189 Tarallo, Fernando, 42, 46 Terrill, Angela, 65, 78 Terzi, Arhonto, 127 Thomason, Sara Grey, 250 Ticio, Emma, 269, 270, 274, 275, 277 Tigău, Alina M., 157 Toda, Martine, 204 Toledo, Andrés G., 281 Toribio, Almeida Jacqueline, 55 Torrego, Esther, 87, 94, 95, 98, 110, 126, 132, 157 Torres Cacoullos, Rena, 300–302, 307, 308 Travis, Catherine E., 300–302, 307, 308 Trenchs-Parera, Mireia, 263 Treviño, Esthela, 93–95, 98 Trofimovich, Pavel, 229, 230, 251 Tuller, Laurice, 270, 274 Turner, John C., 223 Uriagereka, Juan, 86, 88, 94, 98, 131 Uribe-Etxebarria, Myriam, 269, 274, 277 Vajargah, Kianoush Fathi, 205 Vasseur, Gaston, 25 Villeneuve, Anne-José, 23, 27, 28, 31, 37

#### Name index

Vitevitch, Michael S., 324 Walker, Douglas C., 6 Waugh, Linda R., 25, 26 Weenink, David, 204, 230, 255, 281 Weisberg, Sanford, 205 Welby, Pauline, 226 Wickham, Hadley, 205 Williams, Edwin, 107, 108, 111, 131, 136 Wimmer, Gejza, 325 Woolford, Ellen, 99 Wright, Sue, 2

Yeni-Komshian, Grace H., 251

Zagona, Karen, 132 Zahler, Sara, 304 Zdrojewki, Pablo, 158 Zellou, Georgia, 224 Zentella, Ana Celia, 302 Zilles, Ana M. S., 26 Zimman, Lal, 200, 215 Zipf, George Kingsley, 324–326 Zubizarreta, Maria Luisa, 46, 270– 274

## A half century of Romance linguistics

The present volume presents a selection of the revised and peer-reviewed proceedings articles of the 50th Linguistic Symposium on Romance Languages (LSRL 50) which was hosted virtually by the faculty and students from the University of Texas at Austin. With contributions from rising and senior scholars from Europe and the Americas, the volume demonstrates the breadth of research in contemporary Romance linguistics with articles that apply corpus-based and laboratory methods, as well as theory, to explore the structure, use, and development of the Romance languages. The articles cover a wide range of fields including morphosyntax, semantics, language variation and change, sociophonetics, historical linguistics, language acquisition, and computational linguistics. In an introductory article, the editors document the sudden transition of LSRL 50 to a virtual format and acknowledge those who helped them to ensure the continuity of this annual scholarly meeting.