# The role of constituents in multiword expressions

An interdisciplinary, cross-lingual perspective

Edited by Sabine Schulte im Walde Eva Smolka

Phraseology and Multiword Expressions 4

## Phraseology and Multiword Expressions

### **Series editors**

Agata Savary (University of Tours, Blois, France), Manfred Sailer (Goethe University Frankfurt a. M., Germany), Yannick Parmentier (University of Lorraine, France), Victoria Rosén (University of Bergen, Norway), Mike Rosner (University of Malta, Malta).

### In this series:


# The role of constituents in multiword expressions

An interdisciplinary, cross-lingual perspective

Edited by Sabine Schulte im Walde Eva Smolka

Schulte im Walde, Sabine & Eva Smolka (ed.). 2020. *The role of constituents in multiword expressions*: *An interdisciplinary, cross-lingual perspective* (Phraseology and Multiword Expressions 4). Berlin: Language Science Press.

This title can be downloaded at: http://langsci-press.org/catalog/book/239 © 2020, the authors Published under the Creative Commons Attribution 4.0 Licence (CC BY 4.0): http://creativecommons.org/licenses/by/4.0/ ISBN: 978-3-96110-184-9 (Digital) 978-3-96110-185-6 (Hardcover)

ISSN: 2625-3127 DOI:10.5281/zenodo.3598577 Source code available from www.github.com/langsci/239 Collaborative reading: paperhive.org/documents/remote?type=langsci&id=239

Cover and concept of design: Ulrike Harbort Typesetting: Felix Kopecky Proofreading: Amir Ghorbanpour, Amr El-Zawawy, Andreas Hölzl, Carlos Ramisch, Felix Hoberg, Felix Kopecky, Gerald Delahunty, Ivelina Stoyanova, Ivica Jeđud, Jean Nitzke, Jeroen van de Weijer, Jezia Talavera, Lachlan Mackenzie, Steve Pepper, Tom Bossuyt, Valeria Quochi, Yvonne Treis Fonts: Libertinus, Arimo, DejaVu Sans Mono Typesetting software: XƎLATEX

Language Science Press Unter den Linden 6 10099 Berlin, Germany langsci-press.org

Storage and cataloguing done by FU Berlin

## **Contents**


## **Constituents in multiword expressions: What is their role, and why do we care?**

Sabine Schulte im Walde Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart

Eva Smolka Fachbereich Linguistik, Universität Konstanz

## **1 Introduction**

The processing and representation of multiword expressions (MWEs), ranging from noun compounds (such as *nickname* in English and *Ohrwurm* in German) to complex verbs (such as *give up* in English and *aufgeben* in German) and idiomatic expressions (such as *break the ice* in English and *das Eis brechen* in German) have remained an unsettled issue over the past 20+ years.

Our research question concerns semantically transparent MWEs as well as MWEs that result in a meaning shift. For example, in the absence of situational experience, even complex verbs that appear to be fully semantically transparent such as *aufstehen* ('stand up') do not necessarily have whole-word meanings that are easily predictable from their constituents. Even more difficult are complex verbs such as *verstehen* ('understand') and *zustehen* ('legally due'), which contain only a remote resemblance to the meaning of *stehen* ('stand'). Similarly, the constituents of noun compounds do not necessarily contribute to their wholeword meanings in a straightforward way. The meaning contribution may range from relatively semantically transparent as in *Nudelsuppe* ('noodle soup') to semantically opaque, as in *Spitzname* ('nickname', lit. 'pointy name'), *Geduldsfaden* ('patience', lit. 'patience thread'), or *Zwickmühle* ('dilemma', lit. 'pinch mill'), which contain a modifier (i.e. the left constituent) and/or a head (i.e. the right

### Sabine Schulte im Walde & Eva Smolka

constituent) that render the compound semantically more opaque. The most extreme meaning shifts across types of MWEs occur in idiomatic constructions, such as *kick the bucket* and *reach for the stars*, where the literal meanings of the constituents do not seem to contribute to the overall figurative meanings 'die' and 'strive for something unachievable' at all. MWEs of the idiomatic type are typically assumed to be semantically opaque, even though some idioms like *spill the beans* are stronger in reflecting the figurative meaning ('reveal a secret') in a metaphoric way than others.

This edited volume exploits complementary evidence across different types of MWEs to shed light on the interaction of constituent properties and meanings of MWEs. Specialists across languages and across research disciplines contribute to this issue and provide a cross-linguistic perspective integrating linguistic, psycholinguistic, corpus-based and computational studies.

## **2 Contributions**

In the following, the seven contributions in this volume discuss multiword expressions that are composed of different types of constituents, including the combination of particle+stem in complex verbs (e.g., *aufstehen* 'stand up'), the combination of stem+stem in existing and novel compounds (e.g., *nickname*, and *campeel*, respectively), the combination of stem+stem+suffix in deverbal compounds (e.g., *budget assessment*), the combination of stem+preposition+stem in noun compounds (e.g., *juego de niños*), the combination of modifier+stem in modifier-noun phrases (e.g., *the brown dog*) and idiomatic combinations of words (e.g., *reach for the stars*).

Sections 2.1to 2.3 discuss the interdisciplinary perspectives separately for complex verbs, noun compounds and idiomatic expressions, and for each of these three categories of MWEs we summarise the contributions to this collection.

## **2.1 Complex verbs**

Seminal psycholinguistic studies have applied manipulations of semantic transparency to study whether verbal MWEs of the type prefix+stem, particle+stem and stem+suffix are lexically represented and processed via the constituents or as a whole-word unit (e.g., Taft & Forster 1975; Marslen-Wilson et al. 1994; Longtin et al. 2003).

Recurrent findings in English and French showed that semantically transparent words facilitate their base (e.g., *distrust–trust, confessor–confess*). This facilitation effect, however, was not obtained for semantically opaque primes (e.g.,

### 1 Constituents in multiword expressions

*retreat–treat, successor–success*). Lexicon-based models concluded from these findings that a semantically transparent word like *confessor* possesses a lexical entry that corresponds to its base and is represented as the stem (*-confess-*) and suffix (*-or*), whereas *successor* is represented in its full form (e.g., Rastle et al. 2000; Feldman et al. 2004; Diependaele et al. 2005; 2009; Meunier & Longtin 2007; Marslen-Wilson et al. 2008; Taft & Nguyen-Hoan 2010).

Semantic transparency effects emerge also when transparency is manipulated in a more graded way (Gonnerman et al. 2007): Strong facilitation effects showed for strongly phonologically and semantically related word pairs (e.g., *preheat– heat*), intermediate effects for moderately similar pairs (e.g., *midstream–stream*), and no priming for low semantically related word pairs (*rehearse–hearse*). Within learning-based approaches, such as the convergence-of-codes account, form and meaning relatedness between word pairs determines lexical processing (Plaut & Gonnerman 2000; Gonnerman et al. 2007).

Findings in German, however, indicate that lexical processing occurs via the stem and irrespective of semantic transparency (i.e., meaning composition of the complex verb). Low semantically related word pairs (*entwerfen–werfen* 'design'– 'throw') induced facilitation of the stem to the same extent as semantically related word pairs did: *bewerfen–werfen* ('throw at'–'throw') (e.g., Smolka et al. 2009; 2014; 2015; 2019). Most importantly, these findings stress the importance of crosslanguage comparisons: what is true for the processing in one language is not necessarily true for the processing in another language (Günther et al. 2018).

Computational approaches regarding the meanings of complex verbs have mainly focused on predicting the degree of transparency of complex verbs. These approaches typically rely on the distributional hypothesis (Harris 1954; Firth 1957) and empirical co-occurrence information from large corpora, and are realised as vector space models (Turney & Pantel 2010). Regarding English, computational approaches explored variants of distributional models and distributional similarity, comparing word-based and syntax-based descriptions, large-scale vs. dimensionality-reduced representations, and verb-specific vs. general information (Baldwin et al. 2003; McCarthy et al. 2003; Bannard 2005; Cook & Stevenson 2006; i.a.). Regarding German, an initial series of papers (Aldinger 2004; Schulte im Walde 2004; 2005; 2006) studied particle verbs from a large-scale corpusbased perspective, with an emphasis on salient distributional features at the syntax-semantics interface. Schulte im Walde (2006) and Bott & Schulte im Walde (2018) integrated the subcategorisation transfer of German particle verbs with respect to their base verbs into models of compositionality. Kühner & Schulte im Walde (2010), Bott & Schulte im Walde (2017), and Köper & Schulte im Walde

### Sabine Schulte im Walde & Eva Smolka

(2017a) used clustering to distinguish between multiple senses, and common cluster membership to determine compositionality. Köper & Schulte im Walde (2016) and Aedmaa et al. (2018) applied classifiers to identify figurative language usage of German and Estonian particle verbs in context.

So far, most approaches that have dealt with complex verbs – across disciplines and across languages – have considered semantic transparency as the meaning relation between the whole word meaning of the MWE and the meaning of its base constituent, disregarding the contribution of the often ambiguous prefix or particle, e.g., they were concerned with the question: to what degree is the meaning of *stand* reflected in *understand*? Apart from a series of formal word-syntactic analyses in the framework of Discourse Representation Theory (Kamp & Reyle 1993) for German particle verbs with the particles *auf* (Lechler & Roßdeutscher 2009), *ab* (Kliche 2011), *nach* (Haselbach 2011) and *an* (Springorum 2011), this gap of knowledge has recently been addressed from experimental perspectives: Frassinelli et al. (2017) demonstrated in a lexical decision experiment that the particle *an* in German particle verbs is primarily associated with a horizontal directionality, while *auf* is primarily associated with a vertical directionality. Schulte im Walde et al. (2018) and Köper & Schulte im Walde (2018) present data collections to assess meaning components in German complex verbs. The former dataset contains source- and target-domain characteristics of the base verbs and the complex verbs, respectively, and a selection of arrows to add spatial directional information to user-generated contexts; the latter dataset contains ratings for strengths of particle-related pairs of German base verbs and particle verbs.

As part of the present collection, **Springorum & Schulte im Walde** also focus on the meaning contribution of the particle to the overall meaning of German particle verbs. They combine nine particles (e.g., *auf* 'up') with 30 base verbs (e.g., *geben* 'give') and examine how the particles are perceived in adding directionality (i.e., up, down, left, right) to the meaning of the particle verb (e.g., *aufgeben* 'give up'). That is, the participants in their study saw a base verb or a particle verb and decided which type of directionality in form of two-dimensional arrows best reflects the verbal meaning. Their qualitative and quantitative analyses indicate that the particles exhibit individual spatial profiles, but also that the particles vary in their flexibility to provide predominant directions, in interaction with the abstractness of the semantic base verb domains.

## **2.2 Noun compounds**

Compounds also lie on a continuum between relatively transparent and rather opaque with respect to the meanings of their constituents. Psycholinguistic re-

### 1 Constituents in multiword expressions

search so far has been intrigued by the question whether the compound is lexically represented and processed via the constituents or as a whole-word unit. For example, findings on the processing of noun-noun compounds indicate a competition between the compounds' constituents that correspond to independent words and their whole-word counterparts. Hence, upon seeing the compound *doughnut*, the constituent [nut] may compete with the whole word *nut* (e.g., Libben 2006; Frisson et al. 2008; Monahan et al. 2008; Fiorentino & Fund-Reznicek 2009; Gagné & Spalding 2009; 2014; Libben 2014). Another question concerns whether the semantic transparency of the constituents affect the processing of the MWE they compose, and if so, how? Indeed, semantically opaque compounds are generally processed more slowly than semantically transparent ones, and are less likely to show constituent activation – probably because the semantic opacity of the whole compound makes its constituents less relevant to lexical comprehension (e.g., Taft & Forster 1975; Sandra 1994; Zwitserlood 1994; Isel et al. 2003; Libben et al. 2003). Furthermore, recent studies indicate that the influence of semantic transparency is language-specific. The semantic transparency of the head has been found to affect the processing of noun-noun compounds in English and Italian (e.g., Marelli et al. 2009; Marelli & Luzzatti 2012) but not in German (e.g., Smolka & Libben 2017).

Computational approaches to predicting the transparency of noun compounds can be subdivided into two subfields:


As for complex verbs, the computational models under 2. typically rely to a large extent on the distributional hypothesis and empirical co-occurrence information from large corpora. Individual research studies noticed differences in the contributions of modifier and head constituents towards the composite functions predicting compositionality (Reddy et al. 2011; Schulte im Walde et al. 2013), but only a very limited number of approaches zoomed into potentially relevant properties of MWEs and their constituents, such as ambiguity, frequency and productivity (Bell & Schäfer 2016; Schulte im Walde et al. 2016).

### Sabine Schulte im Walde & Eva Smolka

In this collection, **Pezzelle & Marelli** apply a distributional semantic model to show that the semantic properties of the compound and its constituents may explain syntactically-based classes of compounds as suggested in linguistic theories (Bisetto & Scalise 2005). They differentiate between types of compounds such as subordinate, attributive, and coordinate compounds, on the basis of the underlying syntactic relation between the compound constituents. In particular, Pezzele and Marelli provide measures that quantify (a) the degree of semantic similarity between the constituents, and (b) the contribution of each constituent to the overall compound meaning, and show that these semantic measures are effective in capturing the different syntactic linguistic classes. In other words, the continuous quantitative semantic aspects of the meanings of compounds parallel the discrete qualitative grammatical distinctions between compounds.

**Iordăchioaia, van der Plas & Jagfeld** study the compositionality of English deverbal compounds. These deverbal nouns are ambiguous between compositionally interpreted "argument structure nominals", which inherit verbal structure and realise arguments (e.g., *assessment of the budget by the government*), and more lexicalized "result nominals", which preserve no verbal properties or arguments (e.g., *budget assessment*), cf. Grimshaw (1990). While the former are fully compositional, the latter remain ambiguous because the non-head (*budget*) can be interpreted as either subject or object. The authors apply machine-learning techniques to evaluate corpus data and human annotations to support their hypothesis and find that different properties of the head contribute to the interpretation of the deverbal compound.

In the third chapter on compounds, **Libben** investigates English compounds from a psycholinguistic perspective. He uses novel compounds such as *anklecob* and *clampeel*, the former being unambiguous, the latter being ambiguous in the way they can be parsed (i.e. *ankle-cob* versus *clam-peel* or *clamp-eel*, respectively). A typing experiment shows that the typing latencies indeed peak at the morpheme boundary of non-ambiguous compounds. Equivalent latencies at the critical letters of ambiguous compounds indicate that they are parsed in both possible reading ways. Libben refers to the heuristics of his Fuzzy Forward Lexical Activation account, which assumes that MWEs are parsed from left to right for any possible word combination. He concludes that complex words are not static representations but rather patterns of actions.

Two papers deal with MWEs that are untypical compound constructions for which linguistic theories in general refer to the notions of lexicon and syntax and debate whether these MWEs are to be considered as compounds or not. **Hennecke** examines the formation of MWEs of the type "N Prep N" in Romance languages, such as Spanish, French and Portuguese (e.g. *juego de niños*, 'kid's game')

### 1 Constituents in multiword expressions

and takes a constructionist approach to analyse the constructions as abstract templates. In a qualitative analysis, she examines the variation that the preposition in a construction may undergo (e.g. *juego de niños* vs. *juego para niños*, both meaning 'kid's game'). To this end, she analyses the semantic relations between the nominal constituents and the semantic transparency of the constructions. Her findings indicate that variability of the prepositional element occurs only in semantically transparent constructions. Furthermore, prepositional variability largely varies across the three Romance languages.

Also **Gagné, Spalding, Burry & Adams** examine MWEs that are not typically classified as compounds and compare modifier-noun phrases (e.g., *the brown dog*) with full phrases (e.g., *the dog that was brown*). They examine how modifying information that refers to recently encountered information is used in the production of MWEs, and manipulate the property of the head noun between normal (e.g., *brown*) and distinctive (e.g., *blue*). Participants showed a strong overall bias toward using a modifier-noun phrase structure (regardless of whether they previously saw a modifier-noun phrase or a full phrase), and were more likely to include distinctive properties (*the blue dog*) than normal properties (*the brown dog*) when referring to the concept. These findings indicate that modifier-noun phrases have a privileged status among MWEs and provide a good compromise between conveying sufficient information and using simple syntactic structures.

## **2.3 Idioms**

Idiomatic expressions are the MWEs which may be considered as showing the strongest semantic shift that the constituents undergo, because the figurative meaning is usually not even remotely connected with the meaning of its constituents, as in *hit the road*. Rather, idiomatic expressions are considered semantically fixed, since the figurative meaning does not allow the replacement of any of the word constituents (e.g., *\*she hit the street; \*she beat the road*), and the modification of an idiomatic constituent is assumed to change the figurative meaning into a literal meaning.

The processing and representation of idioms has thus remained an unsettled issue in psycholinguistic research: how is the figurative meaning processed and stored in lexical memory? In particular, is the figurative meaning of an idiom represented separately from the meaning of its constituents, and how is the figurative meaning assembled (e.g., Cacciari & Tabossi 1988; Gibbs Jr. 1992; Cacciari & Glucksberg 1994; Titone & Connine 1999; Hamblin & Gibbs Jr. 2003)? Seminal studies thus assumed a "non-compositional" representation in which the whole figurative meaning of an idiom is stored as a distinct entry in the mental lexicon

### Sabine Schulte im Walde & Eva Smolka

similar to the representation of a complex word like *Finanzmarktaufsichtsbehörde* ('financial market supervisory authority') (e.g., Bobrow & Bell 1973; Swinney & Cutler 1979; Gibbs Jr. 1980). More recent hybrid models try to integrate the assumption that idioms are both compositional and unitary: on the one hand, an idiom is composed of single constituents that are activated to some degree, and on the other hand each idiom possesses its own lexical entry that stores the whole meaning of the idiom (e.g., Cacciari & Tabossi 1988; Gibbs Jr. et al. 1992; Cutting & Bock 1997; Titone & Connine 1999; Sprenger et al. 2006; Caillies & Butcher 2007; Holsinger & Kaiser 2013; Titone & Libben 2014).

As far as computational work on idiomatic expressions is concerned, several research studies measured the syntactic flexibility of idiomatic expressions, to a large extent focusing on verb–object combinations (e.g., Bannard 2007; Fazly et al. 2009). These measures varied the constituents of the target MWEs, explored modifiability and passivisation, etc. in order to distinguish between literal vs. idiomatic interpretations. A large number of automatic classification approaches addressed idioms as non-literal language across various types of MWEs, mostly relying on contextual indicators to distinguish between literal and idiomatic interpretations (e.g., Sporleder & Li 2009; Turney et al. 2011; Köper & Schulte im Walde 2016), such as distributional similarity, text cohesion graphs, and contextual abstractness. The variation-based approaches further provide some insight into the flexibility of the constituents of MWEs and their meaning contributions.

The last paper by **Smolka & Eulitz** deals with idioms and how the meaning of the constituents contributes to the figurative meaning. They present three experiments, in which participants rate the meaning similarity between an idiomatic phrase (e.g., *She always reached for the stars*) and a paraphrase of its figurative meaning (e.g., *She always strove for something unreachable*). They exchange the noun, verb, or prepositional idiomatic constituent by a close semantic associate (e.g., *She always reached/grasped for/at the stars/planets*) and find that a modified constituent still preserves the figurative meaning. This study adds to the understanding that there is no completely fixed unitary entry and that the idiomatic constituents do contribute to the figurative meaning of the idiom, even though the figurative meaning is semantically opaque.

## **Acknowledgements**

This collection was supported by the DFG Collaborative Research Centre SFB 732 and the DFG Heisenberg Fellowship SCHU-2580/1 (Sabine Schulte im Walde), and by the Volkswagen Foundation Grant FP 561/11 (Eva Smolka). Special thanks

go to our student researcher Anurag Nigam who type-set this volume. Last but not least we thank our experts from the interdisciplinary fields who ensured a qualitatively high-standing reviewing process:


## **References**


Sabine Schulte im Walde & Eva Smolka


1 Constituents in multiword expressions


Sabine Schulte im Walde & Eva Smolka


Zwitserlood, Pienie. 1994. The role of semantic transparency in the processing and representation of Dutch compounds. *Language and Cognitive Processes* 9(3). 341–368.

## **Chapter 1**

## **Aiming with → arrows ← at particles: Towards a conceptual analysis of directional meaning components in German particle verbs**

Sylvia Springorum

Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart

## Sabine Schulte im Walde

Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart

This article presents a case study on the contributions of prepositional particles to the meanings of German particle verbs (such as *anstrahlen* 'to beam/smile at' and *aufgeben* 'to give up'). Based on a set of 16 "concept images", two-dimensional directional arrow pictographs, 60 experiment participants selected one or more concept images for a systematically composed set of 270 German particle verbs and their 30 base verbs. We formulate a series of hypotheses for the meanings of nine constituent particle types (*ab, an, auf, aus, ein, mit, nach, vor, zu*) and investigate them in the light of the concept image selections. Qualitative and quantitative analyses indicate that our hypotheses are largely confirmed, across three source domains varying in their abstractness (Machines & Tools, Force, Sound), as well as across well-known vs. unknown particle verbs. The particles exhibit individual concept image profiles, and they vary in their flexibility to provide predominant directions; for example, while *auf* is rather consistently perceived as contributing an upward/right direction to a particle verb meaning, *an* shows similarly strong preferences for a set of concept images; in both cases, these tendencies are observed across source domains.

Sylvia Springorum & Sabine Schulte im Walde. 2020. Aiming with → arrows ← at particles: Towards a conceptual analysis of directional meaning components in German particle verbs. In Sabine Schulte im Walde & Eva Smolka (eds.), *The role of constituents in multiword expressions: An interdisciplinary, cross-lingual perspective*, 1–32. Berlin: Language Science Press. DOI:10.5281/zenodo.3598550

Sylvia Springorum, Sabine Schulte im Walde

## **1 Introduction**

German particle verbs (PVs) are complex, separable verb structures such as *anstrahlen/strahlen …an* 'to beam/smile at' that combine a prefix particle (*an*) with a base verb (*strahlen* 'to beam/smile'). PVs represent a type of multiword expressions, which are generally known as a "pain in the neck for NLP" (Sag et al. 2002). Even more, German PVs pose a specific challenge, because the particles are highly ambiguous; e.g., the particle *an* has a partitive meaning in *anbeißen* 'to take a bite', a cumulative meaning in *anhäufen* 'to pile up', and a topological meaning in *anbinden* 'to tie to' (Springorum 2011). In addition, the particles often trigger meaning shifts of the base verbs (BVs), cf. Springorum, Utt, et al. (2013); Frassinelli et al. (2017); Köper & Schulte im Walde (2018); Schulte im Walde et al. (2018); e.g., the PV *abschminken* with the BV *schminken* 'to put on make-up' has a literal meaning in a concrete context 'to remove make-up', as in example (1), and a metaphorical meaning in an abstract context 'to forget about something', as in example (2).


Not only the particle types but also the particle verbs as a whole often have more than a single reading. For example, the PV *anstrahlen* not only means 'to beam at' but also 'to smile at', when derived from the metaphorical meaning of *strahlen* 'to beam', i.e. 'to smile'. The PV *abnehmen* not only means 'to take off/away', but can also be used to express 'to reduce' as an incremental interpretation of 'to take off/away'; in addition, it has obtained the specific sense 'to reduce weight'. The semantic decomposition in the latter two examples seems to be less transparent than in the previous ones, thus indicating different degrees of PV compositionality. Accordingly, we also find opaque compositions such as *aufhören* 'to stop', where the semantics of the BV *hören* 'to hear' does not seem to provide any contribution to the PV meaning at all. Such examples are the reason why PV composition is often deemed idosyncratic, cf. Kratzer (2003).

In this chapter, we explore the meaning contribution of particle types to the meanings of German particle verbs across three semantic domains of base verbs, which vary in their degree of abstractness: Machines & Tools, Force, and Sound. 1 Aiming with → arrows ← at particles

Within our study, we focus on prepositional particle types and the role of directionality. In this vein, Section 2 will motivate our assumptions about particle meanings in German PVs in more detail, before Section 3 presents the design, hypotheses and results of an experiment that collected human judgements on directionality in particle meanings. Section 4 discusses the experiment data and reflects on our preceding assumptions about prepositional particle meanings.

## **2 Particle meanings**

## **2.1 Basic particle meanings and contexts**

For the course of this article, we assume that each particle type has a restricted number of simple primary meanings, which we refer to as *basic meanings*. This is in accordance with Lindner (1983), who identifies a prototypical sense for the English verb particle *out* involving 'paths in the spatial domain'. Without a BV context, the basic particle meanings are underspecified first, and then resolved by contextual constraints provided by the BV. For example, the separation introduced by the particle *ab* in the context of the BV *nehmen* 'to take' evokes a change of state 'to take off/away', whereas in the context of the BV *schminken* 'to put on make up' it evokes a duration 'to remove make up' generated by a sequence of separations. However, not only the BV but also further context has to be taken into account, as there are ambiguous PVs with varying particle meaning contributions. For example, regarding the metaphorical meaning of *abschminken* in example (2), *ab* introduces only a single separation event, in contrast to the sequence of separation events in the literal PV reading.

Previous research has pointed out regularities in the interpretation of particle meanings associated with semantically coherent classes of base verbs, cf. Stiebels (1996); Lechler & Roßdeutscher (2009); Kliche (2011); Springorum (2011). For example, direction and contact represent two independent readings of *an*, among others: The PV in example (3) belongs to the direction meaning class, suggesting that *an* assigns a direction to the BV, whereas in example (4) the PV carries a contact particle meaning. In combination with a movement BV as in example (5), the particle again introduces a direction. In addition, the meaning of *anfahren* requires a decreasing distance, which results in a contact when maximal. Therefore, *anfahren* represents an example with meaning components from both classes, direction and contact. Examples (3–5) show that particle senses vary in their complexity, and they also illustrate the limits of a hard class assignment.

Sylvia Springorum, Sabine Schulte im Walde


In addition, a classification of PVs should not only take lexical information into account. Sentimental connotations, associations to other sensory input, (nature) forces, and dimensionality are just as well involved in the process of sense development. For example, the metaphorical PV *abklappern* (lit. [ab]+'to clatter') illustrates that sensory information can be understood as a part of the PV meaning: *abklappern* creates an ideophone, which is mapped to the verb event, and leads to the meaning-shifted sense 'to pursue something successively', as illustrated by example (6). This perception-based meaning shift process is discussed in more detail by Springorum, Utt, et al. (2013).

(6) Sie she klapperte clattered die the Geschäfte shops nach for tollen great Büchern books ab. [ab] 'She successively searched through the shops for great books.'

Particle meaning is not only influenced by its context, but also provides an influence on the meaning of the context. For example, participants in a sentence generation experiment relying on systematically created PV neologisms (neoPVs) were asked to generate sentences for neoPV types such as *antöten* ([an]+'to kill') and *abschlafen* ([ab]+'to sleep'), without being provided any context (Springorum, Schulte im Walde, et al. 2013). The participants did not only show considerable agreement regarding potential neoPV meanings, but also often agreed in their strategy of dealing with particle senses, in cases where the BV meaning did not fit the PV senses, as in the case of *antöten*, where 'to kill' introduces an absolute change of state, and the generated sentences mainly suggested an *an* meaning of partial affectedness, thus introducing quantification over event parts. This meaning typically cannot be applied to a verb with an absolute change of state, such as *töten*, but the participants obviously re-conceptualised the change-of-state BV *töten* as a process verb, which gradually approximates the

### 1 Aiming with → arrows ← at particles

final state of death. Often, adverbial specifications such as *fast tot* 'nearly dead' were added, which supported the above assumptions. The meaning components in the PV based on the BV were thus adjusted dependent on the particle meaning.

In sum, we define the meaning of a PV as either a direct composition of possible meaning components of particle and BV (if they are compatible), or alternatively as meaning-shifted particle and BV meaning components in strong interaction with the context. On the one hand, PVs can be assigned to discrete particle classes, based on semantically coherent groups of BVs, but on the other hand the classes need to be flexible to allow semantic changes if necessary. At first these two alternative options might seem contradictory, but from a diachronic perspective they reflect two natural processes of sense development. For example, according to Waldron (1979) "new words should first be used in rather specialised senses and subsequently be generalised" and "when such words have once achieved general status we use them without reflection upon their former restricted or technical sense". In addition, "the reverse process, in which a general word is given a special meaning in a restricted context, is just as common". In this sense, the polysemy of particles is considered as a result of adjustment processes of basic meanings to recurring contextual conditions.

## **2.2 Spatial grounds of particle meanings**

As we are focussing on PVs with prepositional particles, we assume that particles are spatially grounded, similar to preposition meanings. Prepositions indicate spatial fundamentals, as discussed by Herskovits (1986) and Dirven (1993), among others. They structure the physical space and determine "language-specific concepts built up in mental space" (Dirven 1993). Simlarly, Gärdenfors (2004) claims that prepositions are "primarily spatial relations" and create "spatially structured mental representations", when used with non-locational words. In order to structure space, it has to be perceived through our senses, with vision representing the predominant human sense (Viberg 1983).

Furthermore, Jackendoff (1983) understands "perception as an interaction between environmental input and active principles in the mind, that impose structure on that input". He demonstrates his view by ambiguous pictures from the school of Gestalt psychology. Lakoff (1987) refers to the "spatialisation of form hypothesis" by using the term *image schema*, which he defines as "schematic descriptions of meaning concepts". So perception of space cannot be separated from cognitive conceptualisation, and (meaning) concepts are often analogies of structures, to define space through perception. Although there are "significant differences between mental imagery and image schemas", according to Gibbs Jr. Sylvia Springorum, Sabine Schulte im Walde

& Colston (1995) there is "good evidence that both spatial and visual representations exist for mental imagery".

We assume that prepositional particles – simlarly to prepositions – introduce relations to structure space and to add verb-related meaning components, such as aspectual or temporal modifications. These relations can be captured by image schemas as "dynamic analogue representations of spatial relations as movement in space" (Gibbs Jr. & Colston 1995) to describe aspects of PV meaning. Accordingly, earlier investigations connect (spatial) concepts with phrasal verbs. Going beyond the already mentioned work by Lindner (1983), Morgan (1997) provides an extension for metaphorical readings of some *out* phrasal verbs. From a didactic point of view, Side (1990) and Abreu & Vieira (2008) discuss the advantages of using image schemas in order to learn phrasal verbs. In a psycholinguistic setting, Richardson et al. (2001) carried out experiments to show that basic images can be related to spatial and abstract verbal meanings.

A semiotic perspective of schematic descriptions is provided by Frutiger (1987), who defines the essential task of a schema as description with the help of literally pictured elements, to divide objects into different parts, instead of only using words.

## **3 Experiment**

This section presents the material, design, hypotheses and results of the experiment that collected human judgements on spatial aspects in particle meanings.

## **3.1 Material**

### **3.1.1 Verb data**

The German particle verbs for the experiment were generated systematically, based on a pre-selected set of base verbs and a pre-selected set of particles. We relied on base verbs from three different semantic domains, Machines and Tools (MnT), Force and Sound, which differ regarding their degree of concreteness. Furthermore, Kövecses (2002) categorises MnT and Force domains as common source domains for metaphors. The verbs belonging to the MnT domain (such as *hämmern* 'to hammer' and *schaufeln* 'to dig') are easy to imagine and represent very concrete BVs. In comparison, the verbs from the Force domain (such as *drücken* 'to press' and *quetschen* 'to squeeze') are less concrete, as the force itself is not perceivable directly, but only through interactions of its concrete entities encoded in the verb arguments. The verbs from the Sound domain (such as

1 Aiming with → arrows ← at particles

*schreien* 'to cry' and *jaulen* 'to yowl') represent intransitive verbs and define the most abstract source domain.

For each of the three domains we chose a total of ten base verbs that we thought as not obviously ambiguous among the three classes, cf. the Appendix (page 32). These 30 BVs were then systematically composed to PVs using nine different prepositional particles. We only took into account particles that cannot also be used in German prefix verbs: *ab, an, auf, aus, ein, mit, nach, vor, zu*. In this way, we obtained 300 verbs (30 selected BVs and 270 generated PVs) as target verbs for the experiment. Due to the systematic composition of the PVs, also PV neologisms (neoPVs) were part of this data set. As part of the experiment tasks, the experiment participants were thus asked to rate a PV as a neologism, such that our analyses can distinguish between existing PVs vs. PV neologisms. Approximately half of the PVs were rated as neoPVs (153 out of 270 PVs), see Section 3.2.

### **3.1.2 Concept images**

Although there are many semantic analyses based on concepts and frequently illustrated by visual schemas or pictographs, as to our knowledge there is no general systematic standard available. We therefore decided to define visual representations for directional concepts from scratch. As source for inspiration, we relied on Dreyfuss' symbol sourcebook, a very detailed collection of various kinds of symbols from many different areas (Dreyfuss 1972), and on a more descriptive sign derivation in Frutiger (1987). We defined the set of directional pictographs as shown in Figure 1. The pictographs were intended to be as simple as possible, in order not to distract from the actual information, but at the same time they should allow possibly alternative interpretations. We refer to our simplified pictographs as *concept images*.

Figure 1: Set of concept images.

### Sylvia Springorum, Sabine Schulte im Walde

Although the number of directions in space is infinite, a simplified conceptual reduction into a two-dimensional setting is in many cases sufficient, because "the salient dimensions of the world reinforce the horizontal and the vertical" (Tversky 2011). We therefore included vertical arrows for upward and downward directions (vert-up, vert-down), horizontal right and left directions (hori-right, hori-left), and also the four diagonal directions (dia-down-right, dia-downleft, dia-up-right, dia-up-left). To represent single object-oriented centerperiphery directions as expansion or constriction, we use lines with arrow heads at both ends.

The outward-pointing arrow heads (vert-out, hori-out, dia-out-up-right, dia-out-up-left) correspond to expansion, and the inward-pointing arrow heads (vert-in, hori-in) correspond to constriction. To distinguish between asymmetrical and uniform center-periphery directions, two arrows with concentric curved lines were added (spiral-out, spiral-in). The total set of concept images contains 16 pictograms.

## **3.2 Design**

The experiment was performed as follows: The 300 verbs were distributed randomly over 6 lists with 50 verbs each. The random distribution was balanced for BVs vs. PVs, BV source domain, particle type and (non-)neologism<sup>1</sup> , such that each file contained equal proportions of these.

Each verb was judged by ≈20 participants, non-experts (mostly students on campus), without payment. They were presented a randomly ordered list of the target verbs (printed out or as a file), together with the concept images. For each verb, the participants were first asked to choose between one of the following statements, to check on whether they knew the PVs:


<sup>1</sup>At this point, we did not yet have human ratings for PV neologisms, so we used a precategorisation which considered a PV as neoPV if it did not appear in the *SdeWaC* web corpus containing 880 million words (Faaß & Eckart 2013).

1 Aiming with → arrows ← at particles

These ratings provided a participant-dependent categorisation of PVs (and also of BVs, but those were not relevant for us) into existing PVs vs. PV neologisms on a four-point scale.

Then the participants were asked to mark those concept images which fit the meaning of the target verb. Multiple marks were allowed while we did not explicitly allow the participants to not select a concept image because we wanted to enforce a selection. However, we asked the participants to describe an alternative image if they decided that none of our concept images fit. In that way they would only fall back to not providing any selection if they really could not settle on a concept image.

## **3.3 Hypotheses**

The main goal of our study was to investigate whether prepositional particles within German particle verbs can be associated with directional concepts, which are visually represented as concept images. As the basis for interpreting the experiment results, this section provides example-based and experience-driven hypotheses for the above-mentioned nine particle types regarding their most prevalent readings. Regarding the particles *ab, an* and *auf*, we in addition rely on detailed formal semantic analyses (Lechler & Roßdeutscher 2009; Kliche 2011; Springorum 2011).

Further than discussing the primary concepts as originating from the spatial domain, we also include time into the interpretation of space, as "knowledge of space frequently comes from motion in time, from exploring environments and piecing together the parts" (Tversky 2011). Furthermore, relying on Boroditsky (2001), who analyses time with the help of spatial metaphors, "concepts of space appear to be primary", concepts of time can be derived from concepts of space.

### **3.3.1** *ab*

*ab* has a basic meaning derived from the gravity force that causes objects to fall. This motion describes a down directional meaning which may be represented by the concept image vert-down. An example is the particle verb *ablaufen* 'to run down' in example (7) where the downward meaning can only be contributed by the particle and not by the BV *laufen* 'to run'. In contrast, example (8) with *absinken* 'to sink' the event of the BV *sinken* 'to sink' already introduces a downward direction. The difference between the BV and the PV meanings is that the BV events refer to an atelic continuous downward motion, as arising from the gravity force down direction, while the PV event is resultative, so a direction is

### Sylvia Springorum, Sabine Schulte im Walde

spanned between the pre-state and the resulting state of the object affected by the gravity force. That is, in example (8) the pre- and result states are the locations of the ship before and after the sinking motion. The PV meaning is thereby almost synonymous to the BV meaning, which only describes the downward motion of the ship, and in addition introduces a result state.


The PV *abfallen* in example (9) also describes a downward direction with preand result states, regarding the button affected by gravity. Here, however, we find a further meaning component: the detachment of the button, a mereological part of the jacket, has to be caused by some force. This example (9) suggests that the particle *ab* may also contribute a separation meaning, which is – according to previous lexical semantic analyses – a productive reading for this particle (Kliche 2011). Often, it is not gravity but other intentional forces which are causing detachments, as in example (10) with the PV *abreißen* 'to pull off'. Here, the direction related to the force may even overwrite the basic downward direction of *ab*, which means that the particle only contributes the separation meaning component to the PV. The directions are explicitly specified through the semantics of the BV, through further contextual clues, or remain unspecified as in example (10). In addition to the gravity-dependent default direction described by vert-down, a "neutral", gravity-independent horizontal direction described by hori-right or hori-left might therefore represent an alternative choice of concept image.


The continuous variant of the discrete separation reading is the decrease of proximity reading which occurs with motion verbs as in example (11). Here

### 1 Aiming with → arrows ← at particles

the alignment with the conceptual direction of time becomes obvious. Similarly, in sentences as in example (12) with *absitzen* 'to wait/endure', lit. 'to sit off' in an abstract context, the spatially grounded basic concept has to be transferred to available abstract dimensions, which are different from space. Regarding *absitzen*, the abstract context *seminar* belongs to the time domain, so that the direction of the particle *ab* can conceptually only align with the conceptual orientation of the time dimension. The conceptual direction is thereby spanned between the starting point and an iteration of separations of mereological parts, which are time intervals. This leads to a progress reading which may be combined with a conceptually vertical value scale (Tversky 2011) to a value decrease meaning, as in example (13). Combining the progress and the value decrease dimensions, dia-down-right is another concept to be expected for the particle *ab*. This idea is comparable to Talmy (2000)'s force dynamics, a conceptual notion of forces split up into different components and relations, which can be applied to various domains.


### **3.3.2** *an*

*an* introduces a direction which is force-independent in its primary meaning, as in example (14), repeated from example (3), with the PV *anschauen* 'to look at' derived from the perception BV *schauen* 'to look' in a spatial context. The direction of human sight – with a neutral head position which is horizontal by default – determines this conceptual direction. Given this, *an* can be represented by the concept images hori-right and hori-left.

(14) Karin Karin schaut look das the Bild picture an. [an] 'Karin looks at the picture.'

### Sylvia Springorum, Sabine Schulte im Walde

In contexts with forces, e.g. as derived from motion, the particle *an* contributes an increase of proximity reading in analogy to the decrease of proximity reading of *ab*. Its direction is aligned with the direction of the goal of the motion expressed by its object to which the proximity is increased, cf. example (15) in comparison to example (11). Due to this goal we expect the concept image horiright with the right-pointing arrow, since the future in Western cultures is on a horizontal timeline conceptually located on the right.

(15) Karin Karin fährt drive Stuttgart Stuttgart an. [an] 'Karin drives towards Stuttgart.'

If the argument represents a concrete object, as *the street lamp* in example (16), repeated from example (5), the relation introduced by *an* can be understood as maximal proximity, such that there is a contact situation in the result state of the verb. In addition, we find readings as in example (17) with *anhämmern* 'to attach by hammering', where the particle *an* introduces a direction orthogonal to the vertical surface of a wall, again enforcing a contact reading. In comparison, examples (18) and (19) refering to horizontal surfaces – where the direction of *an* needs to be vertical – are only semi-acceptable. This strengthens the assumption that the basic conceptual direction of *an* is horizontal, and that hori-right and hori-left will be selected as predominant representations for this reading.


In example (20) with the PV *anfressen* 'to nibble' the particle *an* introduces a relation that identifies parts of the verbal object which are affected by the verb

1 Aiming with → arrows ← at particles

event. Here, the mouse nibbles only on some parts of the apple, scraping through the surface. Conceptually this is an extension of the maximal proximity reading, where the maximum is exceeded and results in a damaged surface of the direct object affected by the verb event. The meaning contributed by *an* is therefore a partial affectedness relation.

(20) Die the Maus mouse frisst nibble den the Apfel apple an. [an] 'The mouse nibbles at the apple.'

In intransitive contexts with an abstract verb notion as in example (21) with the PV *anlaufen* 'to start' where the BV *laufen* 'to run' comes with its abstract and unspecific progress sense and therefore conceptually only provides the dimension of time, the particle *an* spans an abstract conceptual direction between the beginning of the time interval and an unspecified point later within this interval. In such cases, the conceptual direction of the particle is resolved to a meaning which refers to event initiation.

(21) Der the Motor motor läuft go an. [an] 'The motor starts.'

In contexts as in example (22) where the BV *heizen* 'to heat' provides a value dimension, the conceptual direction of *an* is not only associated with the time dimension of the verb event but also with the vertical-value heat dimension. This means that the particle not only introduces the heating event initiation, but also a temperature rise along the timeline. This suggests that dia-up-right, the synthesis of hori-right and vert-up is a suitable concept.

(22) Karin Karin heizt heat den the Ofen oven an. [an] 'Karin heats the oven.'

## **3.3.3** *auf*

*auf* 's basic meaning represents the upward direction up, the opposite direction of the basic meaning of *ab* as derived from the directional alignment with the falling motion caused by gravity. That is, *auf* 's basic meaning is the direction derived from motions caused by forces which overcome gravity. This is the case in example (23), where the upward direction is a result of the gravity-countering shooting force.

Sylvia Springorum, Sabine Schulte im Walde

(23) Das The Wasser water schießt shoot auf. [auf] 'The water shoots up.'

Overcoming gravity often includes an elevation of an object, where a prominent position is more likely in the field of visual perception of an experiencer. Given this, the particle *auf* is also used to mark a coming-into-perception sense as in example (24), where startled birds suddenly become visually perceivable when they lift up from the ground.

(24) Karin Karin schreckt scare die the Vögel birds auf. [auf] 'Karin startles the birds.'

The spatially derived basic up meaning can also refer to a sudden increase of noise, volume or pitch, when resolved in a Sound source-domain context, as in example (25). This mapping of spatial height to a scale is very productive, and often the particle contributes an increase meaning as in example (26). Therefore we expect the concept image vert-up to be associated with this particle.


If *auf* appears in contexts where it can only be applied to the time dimension, the spatially derived up is conceptually spanned between beginning and end of the time interval of the BV event. In this interpretation of the directional concepts the particle covers the whole event time interval (in contrast to *an*'s event initiation interpretation, which only covers the first parts of the event time interval but says nothing about the endpoint), so that its semantic contribution is a completeness reading as in example (27). The event duration is determined by the direct object, as in the consumption of a cookie in example (28). The scale adds a vertical value dimension to the horizontal time notion, measuring the progress and making dia-up-right a plausible concept image.

(27) Karin Karin arbeitet worked die the Aufgaben tasks der the letzen last Woche week auf. [auf] 'Karin finishes off the tasks of the last week.'

1 Aiming with → arrows ← at particles

(28) Karin Karin isst eats den the Keks cookie auf. [auf] 'Karin eats up the cookie.'

### **3.3.4** *aus*

*aus* typically refers to an expansion in the spatial domain, as illustrated by example (29). The growth of an object may also be conceptualised as direction originating from a point within the object, so overall the concepts vert-out, hori-out as well as dia-out-up-right, dia-out-up-left and spiral-out are legitimised.

(29) Das The Universum universe dehnt expand sich itself aus. [aus] 'The universe expands.'

From an object-extrinsic perspective the particle introduces a specified closed area – conceptually understood as a container – to distinguish between an inside and an outside. With the help of an imaginary container concept, it is possible to relate our two-dimensional concept images to this particle meaning.<sup>2</sup> The concept image hori-right represents a plausible concept in order to describe the gravity-independent "default" direction pointing from an inside to an outside area. E.g., in (30) the concept image hori-right may indicate the pulling direction from the bed moved out of its box, the imaginary container.

(30) Karin Karin zieht pull das the Schlafsofa sofa bed aus. [aus] 'Karin opens the sofa bed.'

### **3.3.5** *ein*

*ein* can introduce a shrinking or constriction of an object, as in example (31), and therefore be related to the inward-orientated concepts vert-in, hori-in as well as spiral-in. In analogy to the change from inside to outside described by *aus*, *ein* can also refer to a change from an outside to an inside area, as in example (32). This may be depicted with vert-down, again refering to an imaginary conceptual container representing the transition direction from the outside area to an inside, e.g. through the default opening of a container at the top.

<sup>2</sup>A more appropriate notion of containers requires a spatial concept with a higher dimensional complexity and is thus going beyond the scope of the current study.

Sylvia Springorum, Sabine Schulte im Walde


## **3.3.6** *mit*

*mit* introduces a relation between two arguments of which one may be implicit, as in example (33). The particle does not provide additional information regarding these arguments, hence both symmetrical hori-in and hori-out concepts, which allow no inferences regarding an imbalance, are assumed possible representations for *mit*.

(33) Karin Karin geht go (mit with ihrer her Schwester) sister in in das the Schwimmbad pool mit. [mit] 'Karin joins her sister to go to the pool.'

### **3.3.7** *nach/vor*

*nach* and *vor* introduce orderings in space which are gravity-independent and can therefore describe horizontal relations, suggesting hori-left and horiright as their concepts. The main difference between *nach* and *vor* is their conceptual perspective on the one-dimensional ordering. *nach* focuses on something which can be conceptualised as following, as behind or as an end, cf. example (34), whereas *vor* focuses on a conceptual front or a beginning, as in example (35).


### **3.3.8** *zu*

*zu* provides a gravity-independent direction in the spatial domain similar to *an*, and in addition introduces an assignment or an intention. The assignment can

### 1 Aiming with → arrows ← at particles

be concrete, as in example (36), or abstract, as in example (37), whereas the intention meaning is always abstract, so that the particle's direction also tends to be abstract, as in example (38). We predict that the particle always originates from the spatial domain, and that dia-up-right therefore represents a plausible concept for this P, because it is a synthesis of hori-right, the default direction, and vert-up, the goal representation. The fulfilment of an intention requires effort, i.e., a force, and therefore presupposes resistance. In analogy to *auf* 's countergravity direction, the direction introduced by *zu* is also a counter-direction facing resistance to reach the intended goal. Without further specification and with gravity as the default force to be overcome, the intention to reach a goal can conceptually be described with vert-up.


## **3.4 Concept image selections**

In this section, we present an overview of the actual selections of concept images by our experiment participants, before Section 4 discusses them in light of the hypotheses just introduced. The dataset is publicly available at http://www.ims. uni-stuttgart.de/data/pv-ci.

### **3.4.1 Dataset**

As mentioned in Section 3.2, the 300 verbs were distributed randomly over 6 lists with 50 verbs each, and each list was judged by ≈20 non-experts. Given that participants might have refrained from judging a verb they did not know, the resulting distribution of the number of participant judgements over verb types differs slightly. Most of the verbs received between 16 and 20 judgements.

In total, we obtained judgements across 5,509 verb instances (including only those instances where at least one concept image had been chosen). Table 1

### Sylvia Springorum, Sabine Schulte im Walde

shows the number of concept images that were selected across verb instances. 3,192 (58%) of the target verb instances were assigned exactly one concept image; 1,556 (28%) received two concept images; 11% received three or four, and 2% were assigned between five and 16 concept images. Abstracting over target verbs to particle types, each of the nine particle types received between 540 and 560 judgements across concept images, i.e., we have a rather homogenous number of concept images across particle types.

Table 1: Number of selected concept images per verb instance.


Figure 2 shows the average ratings to which degree the target verbs were (un)known to the experiment participants (cf. Section 3.2). Setting a threshold in the middle of the scale 1–4 at 2.5 classifies 153 of the 300 target verbs as neologisms. All 30 base verbs were known to the participants and received an average rating >3.2. Figure 3 shows that the distribution of unknown vs. known PVs varies across the domains of their underlying BVs. PVs with Force and Sound BVs are more prominent among unknown PVs, while PVs with Machines and Tools BVs are more prominent among known PVs.

### **3.4.2 Concept image selection across particles**

The heat map in Figure 4 shows the preferences for selected concept images across particle types, calculated as follows. For each annotated verb instance we determined the proportion of selection for each concept image. For example, if two concept images were chosen by a specific participant and for a specific verb instance, each of the two concept images received a proportion of 0.5, and all others received proportions of 0. These proportions were then averaged over all PV instances with the same particle type, across participants. The color red indicates strong preferences of a specific concept image selected for a specific particle type, the color blue indicates weak preferences. Overall, the average preferences range from 0.004 to 0.214.

The heat map demonstrates that the particles exhibit clearly different concept image profiles. The particle *auf*, for example, achieved the overall strongest preference of 0.214 for the concept image dia-up-right, and a preference of 0.136 for vert-up. *ab* shows preferences of ≥0.150 for the concepts dia-downright and vert-down. *an*, *nach* and *vor* are associated most strongly with hori-

Figure 3: Unknown/known target particle verbs across domains.


Figure 4: Concept image selection across particle types.

### Sylvia Springorum, Sabine Schulte im Walde

right (preferences 0.138–0.167), *aus* with spiral-out (preference 0.157), *ein* with spiral-in and vert-down (preferences 0.162 and 0.142, respectively), *mit* with hori-out (preference 0.158), and *zu* with hori-in (preference 0.171).

## **3.4.3 Concept image selection across existing PVs and PV neologisms**

The heat maps in Figure 5 specify the particle selections of concept images from Figure 4 regarding the participants' ratings of PV knowledge. That is, the upper plot in Figure 5 shows concept image preferences across particles for well-known PVs with an average rating ≥2.5, and the lower plot shows concept image preferences across particles for rather unknown PVs with an average rating <2.5.


Figure 5: Concept image selection across particles and (un)known PVs.

While we expected to see more strongly associated concept images for particles in rather unknown PVs (refering to some predominant meaning contribution(s)), this is the case for the majority of particle types (e.g., dia-down-right

### 1 Aiming with → arrows ← at particles

for *ab*; hori-right and hori-in for *an*; spiral-in for *ein*; hori-right for *nach* and *vor*; hori-in for *zu*) but not for *auf, aus* and *mit*. The figure however indicates that the concept image selections are largely stable for well-known vs. unknown particle verbs, i.e., the strongest preferences of particle types regarding concept images show up in both heat maps.

### **3.4.4 Concept image selection across BV source domains**

Figures 6 and 7 look into concept image selection across BV source domains. Figure 6 presents the average preferences of selected concept images per domain across all particle types. It shows that already the base verbs exhibit clearly different concept image profiles when taking into account the respective source domain. For Force BVs, the inward-pointing concept images hori-in (0.221) and vert-in (0.125) received the strongest preferences; for MnT BVs, the concept image vert-down (0.154) received the predominant amount of selections, followed by a set of concept images with preferences of ≈0.100–0.110: spiral-out, vert-out, dia-down-right and hori-out, favouring downward- and outwardpointing arrow types while being rather flexible, i.e., with less strong overall preferences; for Sound BVs, the strongly favoured concept image is spiral-out (0.288), with a set of secondary selections for hori-out (0.134), spiral-in (0.109) and vert-out (0.097), favouring spiral-shaped and outward-directed arrows.

Figure 6: Concept image selection across base verbs, with reference to their domains.

Figure 7 demonstrates that the BV patterns across concept images are partly preserved and partly over-written when combining the BVs with specific particles. PVs composed of Force BVs and particles *an, ein, mit, zu* inherit the strong

### Sylvia Springorum, Sabine Schulte im Walde


Figure 7: Concept image selection across particles and BV domains.

1 Aiming with → arrows ← at particles

preference for hori-in. Similarly, PVs composed of Sound BVs and particles *aus, ein, mit, nach, zu* inherit the strong preferences for spirals from the BVs, with *an, nach, vor* at the same time showing strong preferences for hori-right. For PVs composed of MnT BVs, where already the concept image preferences for the BVs were less skewed than for the other two domains, it seems that also the respective PVs do not exhibit specific domain-dependent concept image preferences.

Across domains, the PVs with particles *ab, auf, nach, vor* appear to contribute rather constant meaning components: the most strongly selected concept images tend to be consistent across BV source domains and largely correspond to the overall strongest concept images in Figure 4. PVs with particles *an* and *auf* represent constants in a different way: in comparison to the other particle types, they seem to be more flexible in their meaning contribution, i.e., they do not show particularly strong preferences for specific concept images but similarly strong preferences for a range of concept images. Nevertheless, also these more constant particle meanings are influenced by the BV domains; for example, *ab* shows a strong preference for spiral-out when combined with Sound BVs; *an* shows a strong preference for hori-in when combined with Force BVs; *auf* shows a strong preference for vert-up when combined with Sound BVs and only a loose preference for dia-up-right when combined with Force BVs; *nach* and *vor* show strong preferences for spirals when combined with Sound BVs and no strong preferences when combined with MnT BVs.

## **4 Discussion**

In the remainder of this article, we refer the analyses in the previous section back to our hypotheses about particle meanings and particle concepts (Section 4.1) before we explore the role of the BV source domains (Section 4.2) and go into detailed meaning investigations regarding the particle *ab* (Section 4.3).

## **4.1 General analysis of particle concept hypotheses**

The experiment participants associated *auf* and *ab* with the vertical arrows (vert-up and vert-down) and also with the corresponding diagonal versions pointing to the right: dia-up-right and dia-down-right, as predicted. The respective diagonal arrows pointing to the left were not chosen, which is an indication for the involvement of the horizontal time dimension. The particle *an* was, as predicted, most strongly associated with the hori-right concept image; the

### Sylvia Springorum, Sabine Schulte im Walde

additionally predicted dia-up-right concept image achieved a secondary preference.<sup>3</sup> *nach* and *vor* were strongly associated with the hori-right concept image, which again indicates a reference to the time dimension. Since most *nach* and *vor* readings have a temporal component, a derivation of the basic particle concept from the time domain instead of the space domain should therefore be considered as an explanation.

In the case of *aus*, the spiral-out concept image was selected most often. This can be explained by the strong association of the particle's prevalent meaning refering to a *container* image schema necessary for assigning a direction to *aus*. Since this experimental concept image setting consisted only of two-dimensional arrows, we can however only speculate about the relevance of the container representation. In contrast, *ein* was – in accordance with our assumptions – associated with vert-down (next to spiral-in), although these directions also require the notion of a container. In order to conceptualise an outside area, as necessary for many *aus* PV readings, it might be sufficient to think of a single wall in order to distinguish between an outside and an inside area. This could explain why *ein* received – in contrast to *aus* – stronger preferences for the predicted concept images based on the constraint of the existence of an imaginary container.

The particle *mit* was most strongly associated with the hori-out concept image, in accordance with our assumptions. *zu* was not linked to dia-up-right, which we considered as possible concept representation for the intentional readings with an abstract goal. The strongest selection was in favour of the doublearrow concept image hori-in, followed by spiral-in, thus suggesting that a different sense of the particle was more salient in the contexts of the selected BVs. For example, for the PVs *zuzerren* 'to drag until closed' and *zustopfen* 'to plug' the particle introduces a closure relation, which is connected to the also chosen vert-down. However, the *zu*-PVs based on the abstract Sound BVs were also associated with these concept images, which at first sight does not fit the concrete closure notion. Here, it seems to be more likely that the selection for spiral-in does not represent the particle meaning, but the meanings of the Sound BVs. Together with the choice of hori-in as in *zudröhnen* ([*zu*]+'to drone/get stoned'), this points to an interpretation of *zu* as an abstract closure, where the closure is understood as the impairment of auditory perception, as realised through the very dominant and constant sound provided by the BV *dröhnen*. In this interpretation, each arrow head of hori-in conceptually points to one ear.

<sup>3</sup>Our results regarding *auf* and *an* are also in accordance with the insights of a lexical decision experiment presented by Frassinelli et al. (2017), which indicated that the particles have a predominant vertical/horizontal directionality, respectively.

1 Aiming with → arrows ← at particles

## **4.2 Analysis of BV source domains**

Figure 6 suggested that the BV source domains were associated with different preferences for concept images, although none of the BV classes is directional from a lexical semantic perspective. We believe that the associations between source domains and concept images thus indicate conceptual relations to directionality.

The MnT domain with its concrete BVs provides strong preferences for the concepts vert-down, dia-down-right, vert-out, hori-out and spiral-out. The associations with hori-out and spiral-out can be explained with the visually clearly defined and easily imaginable manners of movement of the BVs *schleifen* 'to sand', *sägen* 'to saw', *spitzen* 'to sharpen', etc., whereas the associations to vert-down, vert-out and dia-down-right can be traced back to the manners of movements of *hämmern* 'to hammer', *graben* 'to dig', *schaufeln* 'to shovel/dig', etc. However, the question arises why only the downward-pointing concept images were chosen and not the upward-oriented ones. We approach the question on a theoretical semantic basis. The BVs are denominal action verbs, either derived from an instrument (such as a shovel, a hammer, a fork) or from an intended result (such as a grave), and describe a repetitive motion. The involved motion has at least two changes of directions, marking the extreme points of the movement. The direct objects of MnT verbs typically refer to one of those extreme points, as in example (39), where *schaufeln* refers to the area beneath the ground which lies below our usual perceptual horizon. This idea corresponds to Lachmair et al. (2016)'s research which shows that words trigger specific spatial locations. Other frequent arguments of *schaufeln*, such as *hole* and *soil*, also refer to such a "down" area, as in examples (40) and (41). Here, the motion is spanned between the initial position of the instrument and the position of the affected area. In the examples (39–41), the direction of the shovel motion is defined between the initial "up" location of the shovel and the "down" location of the ground, thus justifying the downward concept images over upward concept images.


Sylvia Springorum, Sabine Schulte im Walde

(41) Karin Karin schaufelt shovel Erde. soil 'Karin shovels soil.'

On the contrary, the Sound BVs, which are the most abstract verbs in this data set, were not linked to many of our simple directional concept images. They were mainly associated with the spirals, thus suggesting a mental mapping to the prototypical picture of a sound wave. That is, the underlying idea of the spiral as concept representation was a uniform expansion, which matches to the motion behaviour of sound waves. In addition, there was some preference for the double-headed arrows hori-out and vert-out as concept images for the BVs with a repetitive sound character. This can be attributed to the strongly prototypical manner of sound production actions, which are usually caused by an up-and-down motion as in drumming, or a left-to-right motion as in clapping. This means that the Sound BVs, which are not directional from a lexical semantic perspective, were analysed as conceptually directional. This clear-cut mapping between spiral and sound wave as well as between double-headed arrow and manner-of-production of repetitive sounds, allows distinguishing between the concept images triggered by the BVs and the concept images triggered by the particle, which provides insight into the composition process and explains the low compatibility between particle types and Sound BVs, as reflected in the high number of neoPVs in Figure 3 (page 19).

The Force BVs describe events which are mainly defined through the interplay of two concrete arguments. In comparison to MnT verbs, the Force verbs are less concrete, but at the same time they are also less abstract than the Sound verbs. The importance of the arguments shows up in the preference for the concepts hori-in and vert-in, which both have two arrow heads. The concept images are thereby similar to the vectors used in the schematic representations of forceful verbs by Zwarts (2010).

## **4.3 Analysis of particle** *ab*

In the last part of our analyses we focus on concept image preferences regarding one specific particle type. We choose *ab*, the particle which is strongly associated with a downward direction.

Figure 8 shows the distribution over concept images for PVs with particle *ab* across BV source domains. In all three domains, the participants agreed on the two down concepts (i.e., vert-down and dia-down-right), although the PVs in the experiment were assigned to different lexical semantic classes by Kliche (2011).


### 1 Aiming with → arrows ← at particles

Figure 8: Concept image selection for *ab* across BV domains.

Looking into specific PVs with strong preferences for the two down concept images, an example instance of an unknown PV is represented by *abhämmern* ([ab]+'to hammer'), cf. example (42). We assume that this PV was understood as a separation performed by a hammering force. *abquetschen* 'to squeeze off' in example (43) is an instance of a well-known PV where the particle is combined with a Force BV, describing a force that causes a separation. The well-known PV *abklingen* combines the particle with a Sound BV; literally, it describes that a sound fades away, but it is more common in its metaphorical reading of approaching the end of an event together with a value decrease, as in example (44). The approaching of the end of the storm can be conceptualised as decreasing intensity within both the value and the time dimensions, or can alternatively be interpreted only temporally, as a slowly ending process. However, in comparison to the previous examples no causer is involved, suggesting that the downward meaning is conceptually connected to *ab*, even if from a lexical semantic perspective only the result is expressed.

Sylvia Springorum, Sabine Schulte im Walde


The examples illustrate that even though the contexts are rather different, the meanings of the particle can in all cases be traced back to a downward direction, either causing or being caused by a separation, varying according to the constraints. We argue that the downward concept is not only the basic meaning component, but the prototypical reading for the particle *ab*.

## **5 Conclusion**

In this article, we have demonstrated that directional concepts, visually represented as arrow pictographs, can be applied to a systematically composed set of German particle verbs and their underlying base verbs. Furthermore, the selected concept images were mostly in accordance with the particle directions predicted on the basis of example sentences, lexical-semantic classifications and spatial experience, and largely stable for well-known vs. unknown particle verbs. Thus, direction is a concept that should be taken into account as a part of the PV composition process and the contribution of the particle to the particle verb meaning.

Understanding potential particle fundamentals as concepts, instead of meanings, has the advantage that senses are not considered as discrete, static classifications requiring plenty of compromises or borderline cases. Concepts as basic components are flexible and can easily be adjusted to various contexts. Thereby, classes of similar contextual requirements trigger similar concept adjustments, and hence are assumed to enforce a specific particle sense.

## **Acknowledgements**

The research was supported by the DFG Collaborative Research Centre SFB 732 (Sylvia Springorum, Sabine Schulte im Walde) and the DFG Heisenberg Fellowship SCHU-2580/1 (Sabine Schulte im Walde).

1 Aiming with → arrows ← at particles

## **References**


Frutiger, Adrian. 1987. *Der Mensch und seine Zeichen*. Wiesbaden: Marix Verlag.


Sylvia Springorum, Sabine Schulte im Walde

*ciation for Computational Linguistics: Human Language Technologies*, 150–156. New Orleans, LA, USA.


## **Appendix**

Table 2: Selected 30 base verbs and their source domains. All these base verbs were systematically composed to a total of 270 particle verbs by prefixing them with the nine constituent particle types *ab, an, auf, aus, ein, mit, nach, vor, zu*.


## **Chapter 2**

## **Do semantic features capture a syntactic classification of compounds? Insights from compositional distributional semantics**

## Sandro Pezzelle

Institute for Logic, Language and Computation, University of Amsterdam

## Marco Marelli

Department of Psychology, University of Milano-Bicocca

Classifying compound words has been the ultimate goal of much research in formal linguistics. A popular, cross-linguistically applicable classification (Bisetto & Scalise 2005) distinguishes three main types of compounds, namely Subordinate, Attributive, and Coordinate on the basis of the underlying *syntactic* relation between the compound elements. Similar tripartitions have also been proposed in cognitive psychology by works exploring conceptual combination. Focusing on the type of *semantic* interpretation assigned to novel combinations, three main classes have been traditionally described, namely Relation-linking, Property-mapping, and Hybrid or Conjunctive (see Wisniewski 1996). Based on these commonalities, we conjecture that syntax-based compound types might also be explained by means of the semantic properties of the compound and its constituents. Using a compositional model of distributional semantics (cDSM), we show that (a) the contribution of each constituent in determining the meaning of the compound and (b) the semantic similarity between the two constituent words are significant predictors of these classes. These findings suggest that the various compound types identified by syntactic criteria can also be predicted by means of semantic features. On the one hand, this confirms the validity of the proposed linguistic categorization. On the other hand, we bring further evidence proving the effectiveness of cDSMs in describing linguistic phenomena.

Sandro Pezzelle & Marco Marelli. 2020. Do semantic features capture a syntactic classification of compounds? Insights from compositional distributional semantics. In Sabine Schulte im Walde & Eva Smolka (eds.), *The role of constituents in multiword expressions: An interdisciplinary, crosslingual perspective*, 33–60. Berlin: Language Science Press. DOI:10.5281/zenodo.3598556

Sandro Pezzelle & Marco Marelli

## **1 Introduction**

## **1.1 Classifying compounds**

Compounding, namely the mechanism by which two independent words (e.g. *pet*, *food*) combine together to form a novel morphologically-complex word (e.g. *petfood*), is one of the most extensively covered topics in the literature of word formation.<sup>1</sup> On the theoretical level, many linguists have been particularly interested in classifying *compounds* according to various criteria, such as "headedness" (roughly speaking, the position and the characteristics of the compound *head*, the dominant word in the compound, e.g. *food* in *petfood*) (Bloomfield 1933; Fabb 1998); the presence of a verb or a deverbal noun (Marchand 1969); the kind of underlying relation between the *constituent* words, either at a syntactic level (Bloomfield 1933; Bally 1950; Lees 1960; Bisetto & Scalise 2005; Baroni et al. 2009; Dressler 2006; Scalise & Bisetto 2009) or at a semantic level (Levi 1978; Warren 1978; Fanselow 1981). Though different and pertaining to somehow diverse levels of analysis, these criteria have been traditionally explored and mixed together within the same classification framework (see among others Bauer 2001; Haspelmath 2002; Booij 2005). As a consequence, many influential proposals distinguish various classes of compounds on the basis of several overlapping properties that often generate an inconvenient number of subclasses and special cases.

To overcome this issue, Bisetto & Scalise (2005) proposed a cross-linguistic (and nowadays widely accepted) classification framework based on a single, homogeneous criterion, that is, the underlying syntactic relation between the compound constituents. Three main classes of compounds are isolated, namely Subordinate, Attributive, and Coordinate. To illustrate, the compound *doghouse* belongs to the Subordinate class, since the syntactic relation subtending *dog* and *house* is that of subordination. Indeed, the compound can be paraphrased as 'the house of the dog'. In contrast, *swordfish* is labeled as Attributive, given that the first constituent, *sword*, acts as an attribute of *fish* (a *swordfish* is 'a fish whose nose is shaped like a sword'). Finally, Coordinate compounds are formations like *comedy-drama*, where the first and the second constituent are linked by the underlying conjunction 'and'.

## **1.2 From** *word* **combination to** *conceptual* **combination**

Interestingly, a similar tripartition has been proposed in the cognitive psychology literature by works on *conceptual combination* (Wisniewski 1996; Costello &

<sup>1</sup> For a complete and exhaustive overview of compounding, see Lieber & Štekauer (2009).

### 2 Do semantic features capture a syntactic classification of compounds?

Keane 2000), where the focus is on the type of interpretations provided by people to novel combinations. By analyzing the circumlocutions produced by speakers to interpret novel compounds like *zebra-horse*, in fact, three main classes have been traditionally isolated, namely Relation-linking, Property-mapping, and Hybrid or Conjunctive. The first class includes interpretations involving a relation between the two concepts, i.e. a *zebra-horse* is 'a horse that preys zebras'. In the second, a property of one concept is mapped to the other, i.e. a *zebra-horse* is 'a striped horse'. In the third, the novel concept is interpreted as a hybrid or conjunction of the constituent concepts, i.e. a *zebra-horse* is 'a creature having many properties of both horses and zebras'. Though the aim of these works is to study the various interpretations to novel conceptual combinations, without any interest in recognizing classes of *lexicalized* compound words, the types they identify are reasonably comparable to the linguistic ones proposed by Bisetto & Scalise (2005). In particular, Relation-linking interpretations correspond to compounds included in the Subordinate class, Property-mapping to Attribute, and Hybrid/Conjunctive to Coordinate.

A notable difference is that the *linguistic* classification accounts for lexicalized (or familiar) compounds, whereas the *cognitive* one describes novel combinations which still lack a single, well-defined interpretation. However, we can easily assume that lexicalized compounds are the linguistic realization of a conceptual combination process, in a way that all compounds start out as novel formations and become lexicalized with usage in time (Gagné & Spalding 2006). Consistent with this claim is recent evidence showing that, in the processing of both novel and familiar compounds, an active combination of constituent meanings is routinely in place (Gagné & Spalding 2009; Ji et al. 2011; Marelli & Luzzatti 2012; Marelli et al. 2014). This would suggest that the difference between novel and familiar compounds is merely in their degree of lexicalization. While the former can still be interpreted by speakers in various ways, the latter have only one possible interpretation, that the classification by Bisetto & Scalise (2005) describes in terms of a fixed syntactic relation between the compound's constituents.

The second important difference is that interpretations of novel combinations pertain to the conceptual level, that is, they describe relations between the concepts being combined together. As such, the tripartition described above is essentially *semantic*. In contrast, the linguistic classification considered here is based on a purely *syntactic* criterion. Based on the commonalities highlighted above, however, it might be that the two levels of analysis are not mutually exclusive, but possibly related and somehow overlapping. Lexical semantic approaches corroborate this conjecture. Lieber (2009), for example, proposed that the different compound types identified by Bisetto & Scalise (2005) depend, at least in part, on

### Sandro Pezzelle & Marco Marelli

the intrinsic semantic features of the compound constituents. Moreover, classifications of compounds based on taxonomies of semantic relations reveal a certain degree of overlap between the syntactic and the semantic analysis (Levi 1978). For example, the semantic relation AND seems hardly distinguishable from the purely syntactic relation of coordination, which is subtended by the underlying conjunction 'and'.

## **1.3 Aim of the work**

Based on this concurring evidence, we conjecture that various classes of compounds defined at the syntactic level may be also explained in terms of the semantic properties of the compounds and their constituents. In particular, our hypothesis is that measures quantifying the semantic role played by each constituent in contributing to the overall compound meaning, as well as the degree of semantic similarity between the constituents, should be effective in predicting different classes. Moreover, we expect these semantic measures to be able to capture different, syntax-based classes without relying on other non-semantic properties of compounds. Crucially, we do not claim that the distinction is thus purely semantic, making superfluous any categorization focusing on the syntactic relation between the compound constituents. Rather, we believe that the theoretically motivated and widely accepted *discrete* classifications proposed by linguists can be also described in terms of the *continuous*, quantitative aspects of the meaning of compounds and their constituents. In other words, we expect the *quantitative* semantic properties to parallel the *qualitative* grammatical distinctions, thus demonstrating, at the same time, the effectiveness of our proposal and the validity of the linguistic theory.

We experiment with a dataset of English compounds for which annotation based on the classification by Bisetto & Scalise (2005) (Subordinate, Attributive, Coordinate) is available. To predict each class, we use several semantic variables such as the degree of similarity between the constituents and the individual contribution of each constituent word in determining the meaning of the whole compound. We quantify these measures by using a compositional model of distributional semantics (Baroni & Zamparelli 2010; Guevara 2010; Mitchell & Lapata 2010; Zanzotto et al. 2010), following recent evidence proving the effectiveness of this approach in modeling morphological processes such as composition and derivation (Marelli & Baroni 2015; Günther & Marelli 2016; Marelli et al. 2017).

### 2 Do semantic features capture a syntactic classification of compounds?

## **1.4 Computational models of meaning**

Based on the core notion that similar words occur in similar contexts (Harris 1954; Firth 1957), distributional semantic models (henceforth, DSMs) represent lexical meanings by means of vectors encoding the contexts in which words appear in a large corpus. The intuition is that words that occur in similar linguistic contexts (e.g., *cat* and *dog*) should be semantically more similar than words that do not. Typically, this geometric representation is used to quantify the degree of distributional similarity between two words. Given the corresponding vectors, the similarity is computed in terms of their geometric distance, typically the cosine of the angle (Turney & Pantel 2010). In particular, the closer two vectors in the semantic space (i.e., the space populated by all the linguistic vectors), the higher their similarity. Traditional DSMs, such as the pioneering Latent Semantic Analysis (LSA; Landauer & Dumais 1997), have been largely used to obtain quantitative estimates of important semantic variables such as the degree of conceptual or topical similarity between two words (Padó & Lapata 2007; Gagné & Spalding 2009; Kuperman 2009; Wang et al. 2014).

## **1.5 Distributional semantics and compounds**

In the domain of compounds, distributional semantic approaches have been extensively applied to two main tasks: noun-noun compound interpretation (Van de Cruys et al. 2013; Dima & Hinrichs 2015; Dima 2016; Shwartz & Dagan 2018; Fares et al. 2018) and compositionality prediction (Reddy et al. 2011; Schulte im Walde et al. 2013; Salehi et al. 2014; 2015; Cordeiro et al. 2016). The former task, usually tackled as a classification problem, aims at automatically predicting the *semantic* interpretation of the compound (i.e., the semantic relation between the constituents). Given the compound *street protest*, for example, a system is trained to predict that the relation holding between the nouns is 'locative'. Several datasets of compounds annotated with different numbers of semantic relations have been released for the tasks (Ó Séaghdha 2007; Tratz & Hovy 2010), and various systems capitalizing on distributional representations (usually obtained with neural network architectures; see Section 2.1) have been recently proposed. Overall, this approach has been proved to be successful in the task, though the performance is shown to be dependent on the number and granularity of semantic relations. As for the latter task, it is focused on predicting the degree of compositionality of a noun-noun compound, namely the extent to which the meaning of the whole depends on the meaning of the constituent words. Various datasets annotated with human judgments have been proposed through time (Reddy et

### Sandro Pezzelle & Marco Marelli

al. 2011; Roller et al. 2013; Farahmand et al. 2015), and extensive explorations of DSMs in the task have been carried out. Crucially for the purpose of this study, distributional measures of similarity obtained with compositional approaches were found to be highly predictive of human judgments in this task (Reddy et al. 2011; Schulte im Walde et al. 2013; Salehi et al. 2015; Cordeiro et al. 2016).

## **1.6 A compositional approach to compounds**

Of great interest for the present work, Lynott & Ramscar (2001) were the first to employ distributional semantic models to study novel compounds (e.g. *zebrahorse*). In particular, the aim of that work was to test whether a measure of semantic similarity between compound constituents (quantified with LSA) was predictive of both (a) the ease of novel compound comprehension and (b) the distinction between Relation-linking and Property-mapping combinations. To do so, they experimented with novel compounds and their corresponding interpretations as provided by previous works on conceptual combination (Wisniewski & Love 1998; Gagné 2000). Overall, the model was shown to perform remarkably well in all the tasks. Lynott & Ramscar (2001), however, claimed that current distributional models like LSA were not capable of modeling the whole process of conceptual combination. Since they can only quantify the similarity between independent, free-standing words (e.g. *zebra* and *horse*), they are not informative at all about the relation between these words and the resulting compound. As such, they represent static, word-based models of lexical semantics which do not account for the potentially infinite linguistic productivity.

Compositional DSMs (hence, cDSMs) tackle precisely these issues. Aimed at accounting for the compositional nature of language (Baroni, Bernardi, et al. 2014), these models capitalize on DSM vectors and perform either simple (Mitchell & Lapata 2010) or more complex, theoretically inspired operations (Baroni & Zamparelli 2010; Guevara 2010; Zanzotto et al. 2010) to *compose* existing lexical entries. By exploiting simple operations (sum, multiplication) or being trained with distributional information about combinations that are already observed in the source corpus, these models can indeed be used to generate meaning representations for both novel and lexicalized formations. Recently, this approach was shown to be effective in modeling morphological processes such as derivation and compounding (Marelli & Baroni 2015; Günther & Marelli 2016; Marelli et al. 2017). Closely related to the present study, recent work (Günther & Marelli 2016; Marelli et al. 2017) exploited cDSMs to generate compositional representations of compounds. Marelli et al. (2017), in particular, explored whether a simple but effective regression-based compositional method (Guevara 2010) can capture the

### 2 Do semantic features capture a syntactic classification of compounds?

variability in semantic relations between the constituents of novel compounds. This system was shown to be remarkably effective and flexible in capturing relational information. Based on this evidence, in the present work we employ the same model and test it in the task of predicting theoretically motivated, syntaxbased classes of compounds.

## **2 Experiment**

The present experiment investigates whether different, syntax-based classes of compound words (Subordinate, Attributive, and Coordinate) can be captured by means of semantic properties of the compound and its constituents. To quantify these properties, (a) we generate compositional representations of compounds and obtain similarity scores assessing the role of each constituent in contributing to the overall meaning; (b) we measure the degree of similarity between the first and second constituent.

A note on the terminology used in the paper. Until this point, we used the neutral terms "first constituent" and "second constituent" to refer to, respectively, *dog* and *house* in *doghouse*. As briefly mentioned in Section 1.1, one constituent usually plays a dominant role compared to the other since it acts as the "head" of underlying phrase. In this example, the head is clearly *house* (indeed, *doghouse* is 'the house of the dog'). Consistently, this element determines the syntactic category of the phrase and, semantically, it represents a hyperonym of the compound. By default, in English compounds the second constituent acts as the compound "head", whereas the first acts as the compound "modifier" (Bauer 2009). We stick with this arguably simplified terminology<sup>2</sup> and, from now on, we interchangeably use the terms "first constituent" or "modifier" to refer to the leftmost element, "second constituent" or "head" to refer to the rightmost one.

## **2.1 Semantic space**

Following Baroni, Dinu, et al. (2014), who demonstrated that DSMs generated using feedforward neural network models largely outperform traditional countbased architectures in many tasks, we built a state-of-the-art CBOW semantic space using the word2vec toolkit by Mikolov et al. (2013), with all the parameters that turned out to be best-predictive in Baroni, Dinu, et al. (2014). In particular, the

<sup>2</sup>Without going into much detail, it should be mentioned that this picture is indeed less straightforward than it may appear. For instance, in the English compound *singer-songwriter* the two constituents play a similar role, in a way that they could be both considered as the compound "head" (and the compound as "double-headed") (Bauer 2009).

### Sandro Pezzelle & Marco Marelli

vectors have 400 dimensions and were built using (a) a context window of 5 words to either side of the target word, (b) a subsampling procedure which penalizes high-frequency words in the training phase ( = 1 × 10−5), (c) 10 negative samples. The vectors were trained using a corpus of written English containing around 2.8-billion tokens (a concatenation of BNC, ukWaC, and a 2009-dump of Wikipedia), the same used in Baroni, Dinu, et al. (2014). To avoid sparsity effects, we experimented with the vectors corresponding to the 300k most frequent words in the corpus.

## **2.2 Materials**

We experimented with a sample of the MorBoComp database including 163 English compounds. MorboComp is a large, multilingual database of compounds that has been developed to study compounding from a typological perspective.<sup>3</sup> Each compound in the database is richly annotated (i.e., it is provided with information about headedness, compound and constituents' grammatical category, compound structure, etc.) and, crucially for our purposes, it is classified as Subordinate (hence, SUB), Attributive (hence, ATT) or Coordinate (hence, CRD) on the basis of the classification and terminology proposed by Bisetto & Scalise (2005). To illustrate, *schoolteacher* is tagged as SUB, *keyword* as ATT, and *king-emperor* as CRD.

Consistent with the criteria outlined in Bisetto & Scalise (2005), the 163-item sample contained cases of both "phrasal" compounds (*do-it-yourself illustration*, *around-the-world flight*) and "neoclassical" formations (*bibliography*, *theology*). In addition, a handful of items labeled with OTH (i.e., Other) were found. However, since this label was used by the annotators for either unresolved or idiosyncratic cases, we decided not to consider them in our investigation. Similarly, we removed neoclassical formations since their constituents can be affixes and suffixes rather than free-standing, independent words (e.g. *biblio-*). As a consequence, in our distributional semantics approach we could not have a vector representation for these items. Finally, additional 9 compounds were discarded since one of their constituents turned out not to be included in the 300k-vector semantic space. Specifically, 8 out of 9 of the missing items were first constituents of phrasal compounds, e.g. *all-goes-well* (*in all-goes-well atmosphere*) or *floor-ofa-birdcage* (*in floor-of-a-birdcage taste*), whereas in one case (*well-deserver*) the missing items was the second constituent (*deserver*). After this filtering process, our resulting dataset included 132 compounds (67 SUB, 49 ATT, 16 CRD), that we used for our experiment.

<sup>3</sup> For further details, see: http://morbocomp.sslmit.unibo.it/

### 2 Do semantic features capture a syntactic classification of compounds?

## **2.3 Generating composed representations**

For each of the 132 compounds in the list, we generated a composed representation using the vectors described in Section 2.1 and the compositional model by Guevara (2010). As previously mentioned, one of the main strengths of compositional DSMs is their ability to produce meaning representations also for combinations that are not attested in the source corpus. That is, given a novel or unattested compound, we are able to represent it as an independent vector on the basis of the meanings of its constituents (*zebra* and *horse*). This aspect was of crucial importance in our experiment, where 60 out of the 132 compounds extracted from MorBoComp turned out not to be present in the source semantic space. That is, almost half of the compounds were not among the 300k most frequent words in the corpus and, consequently, did not have a distributional representation. By using a compositional model capitalizing on the representations of the two constituents, however, we were able to overcome this limitation of traditional DSMs and generate a meaning representation for all the items, regardless of whether they had a "static" semantic representation or not.

The method used in the present study, in particular, was implemented by Guevara (2010) to model compositionality as depending on the semantic relation instantiated in the syntactic structure. As such, it looks particularly suitable for the case of compounds, which embed a modifier-head structure. Indeed, previous work proved this model to be very effective in generating composed representations for compounds (Marelli et al. 2017). Technically, the composed representations are obtained with the combinatorial procedure depicted in Figure 1: given two vectors ⃖⃗ and ⃖⃗ each representing one of the constituent words, their composed representation can be computed as ⃗ = **M**⃖⃗ + **H**⃖⃗, where **M** and **H** are weight matrices estimated from training examples. These matrices are trained using least squares regression,<sup>4</sup> having the vectors of the constituents as independent words (*dog*, *house*) as inputs and the vectors of example compounds (*doghouse*) as outputs. The two matrices are thus optimized so that the similarity between the weighted sum of the two constituent vectors (the composed vector) and the compound vector extracted from the semantic space (the observed vector) is maximized. Or, in other words, the composed vector obtained by means of the compositional model is built in a way that closely approximates the original one.

<sup>4</sup>As reported by Guevara (2010), this method is commonly employed to approximate functions in problems of multivariate multiple regression with a small number of observations and a greater number of variables, that is a similar condition to the one involving highdimensionality vectors representing word meanings and (relatively) limited data.

Sandro Pezzelle & Marco Marelli

Figure 1: Representation of the training phase of the compositional method used in the study (adapted from Marelli et al. 2017).

In the present study, we trained the compositional model with a list of English noun-noun compounds extracted from the CELEX English Lexical Database (Baayen et al. 1995). By default, we treated all compounds as written in *solid* form, that is, without whitespaces or hyphens between the two constituents. When the solid compound was not found in our semantic space, we looked for it in its hyphenated form. The training set included 2174 triplets ⟨modifier, head, compound⟩, none of which was also present in the dataset we obtained from Mor-BoComp. We then used the estimated weight matrices for generating composed representations for each of the 132 compounds in our sample.

## **2.4 Semantic variables**

For each vector obtained compositionally, we computed four composition-based semantic measures, namely (1) similarity between the composed representation of the compound and its modifier (e.g. between *keyword* and *key*), (2) similarity between the composed representation of the compound and its head (e.g. between *keyword* and *word*), (3) neighborhood density, that is, the average cosine similarity between the composed vector and its top-10 nearest neighbor vectors in the semantic space (all these 3 measures have been introduced by Vecchi et al. 2011), and (4) entropy, that is a measure of vector quality firstly introduced by Lazaridou et al. (2013).

By operationalizing the similarity between the composed compound vector and either constituents, in particular, we aimed at quantifying the extent to which each single word contributes to the overall meaning obtained compositionally. Although operationalized in terms of the cosine of the angle between the compound vector and either constituents (in the same way as standard DSMs

### 2 Do semantic features capture a syntactic classification of compounds?

do), indeed, these measures genuinely describe the morphological process itself rather than merely taking into account its start and end points. Based on these properties, such measures have been recently used in studies with compound words. For example, they have been shown to be effective in predicting meaningfulness ratings on novel combinations (Günther & Marelli 2016) and in capturing relational information in compounds (Marelli et al. 2017).

As far as neighborhood density and entropy are concerned, both of them have been proposed to provide information about the meaningfulness of vectors encoding new concepts. The rationale of the former is that meaningful vectors should live in a region of the semantic space that is densely populated by vectors representing many related concepts, while meaningless vectors should be way more isolated. For the latter, the intuition is that meaningful vectors should have a skewed distribution, with few dimensions (corresponding to the salient semantic features of the word) being highly activated, i.e. having large values. In contrast, meaningless vectors should have a more uniform distribution, which would be a proxy for a less defined, fuzzier meaning. As a consequence, entropy would be inversely correlated with meaningfulness.

A (5) fifth semantic but non-compositional measure was introduced following Lynott & Ramscar (2001), who employed Latent Semantic Analysis (LSA) to quantify the degree of similarity between the first and the second constituent of a compound. Here, we took the compound constituent vectors (e.g. the vectors of *key* and *word*) from the source semantic space (see Section 2.1) and simply computed their cosine similarity. This measure might be helpful in distinguishing between different compound classes, based on the evidence that in both theoretical linguistics (see Lieber 2009) and conceptual combination literature (see Wisniewski 1996) this factor has been considered as explanatory of different classes/interpretations.

## **2.5 Non-semantic variables**

In addition to the 5 semantic variables described above, we also included in our experiment a number of non-semantic control variables. For each compound and its constituent words we extracted word-form frequency from the source corpus (i.e., the number of times a word is encountered in the corpus in that exact form, regardless of its grammatical category). Compound frequency was calculated by summing the occurrences of the given compound in both solid and hyphenated orthographic form (*blackboard* and *black-board*, respectively). All frequency values, namely (6) compound frequency, (7) modifier frequency and (8) head frequency were subsequently log-transformed following standard practice in psycholinguistics (Brysbaert et al. 2018).

### Sandro Pezzelle & Marco Marelli

Table 1: Mean and standard deviation of all the predictors included in the experiment. MCsim: modifier-compound similarity. HCsim: headcompound similarity. MHsim: modifier-head similarity. Comp length: compound length. Comp freq: compound frequency. Mod freq: modifier frequency. Head freq: head frequency.


In addition, we computed (9) Pointwise Mutual Information (PMI) between the constituents as a measure of compound lexicalization. This widely-used association measure (Church & Hanks 1990) compares the probability of co-occurrence of two words in the source corpus with the probability of the two words cooccurring by chance. To illustrate, although the word pair ⟨the apple⟩ is likely much more frequent than ⟨apple juice⟩, the PMI of the latter will be higher, since the determiner *the* is likely to co-occur very frequently with any noun in the corpus, thus being less informative compared to the pair ⟨apple juice⟩, whose mutual association is intuitively strong. In particular, the higher the degree of lexical association between two words, the higher the PMI value.

Finally, we included (10) compound length measured as the number of characters making up the string (e.g., *blackboard* has length 10). When present, hyphens were not counted. Descriptive statistics including mean values and standard deviations for all the predictors used in the present experiment are reported in Table 1.

### 2 Do semantic features capture a syntactic classification of compounds?

## **2.6 Data analysis**

Our hypothesis is that various, syntax-based classes of compounds might be predicted on the basis of semantic features. If this is correct, our semantic variables will turn out to be reliable predictors of one class over the others. In order to test our hypothesis, we included all the predictors reported in Table 1 in a series of logit regression models that individually estimated the probability of one class over the other. That is, we tested three separate models in the task of predicting one compound type against each of the others: (1) ATT vs SUB, (2) ATT vs CRD, (3) CRD vs SUB.

All analyses were carried out within the R statistical computing environment. We adopted a backward procedure to progressively simplify each statistical model. Starting from a full-factorial model including all the independent variables, predictors were removed one by one when their absence did not significantly lower the overall model fit. At each step, the removal procedure was attempted for the predictor with the largest -value. The contribution of each parameter to be removed was checked with a goodness-of-fit chi-square test. Finally, atypical outliers were identified and removed using as a criterion 2.5 standard deviation of the residual errors.

## **3 Results**

For the better presentation of results, we summarize them in tables and discuss each model in a separate section. In the leftmost part, each table reports the list of variables included in the full-factorial version of each model. In the central part, model-simplification procedure (*Removal order*), chi-square goodness-of-fit test (*Chi-square*) and its results () are reported. The rightmost part shows the effects of the variables included in the final model.

## **3.1 ATT vs SUB**

The first model, testing ATT (*halfprice*) against SUB (*bus-stop*) compounds, reliably distinguishes the two classes on semantic bases. As shown in Table 2, SUB is predicted against ATT by the higher semantic similarity between the compound and either the modifier ( = 0.0182) or the head ( = 0.0355). That is, the meaning of SUB compounds such as *bus-stop* is found to be more strongly determined by the individual meanings of its constituents compared to ATT compounds like *halfprice*, since both the modifier and the head contribute to the overall meaning to a greater extent than either constituents of ATT compounds do. Therefore, the



Table 2: Results of the logit model opposing ATT (1) to SUB (0).

higher the similarity between the compound and either constituent, the higher the probability to have a SUB rather than an ATT compound.

It should be noted that frequency measures, entropy, compound length and the similarity between the two constituents were progressively removed from the model. That is, their effects do not contribute to predict one class over the other. The remaining variables, namely PMI and neighborhood density, are instead included in the final model, even though their effect is only partially reliable ( > 0.05). Both these measures, anyway, indicate that higher values of both PMI and density are more likely to predict ATT rather than SUB compounds.

## **3.2 ATT vs CRD**

The second model tests ATT (*halfprice*) against CRD (*comedy-drama*) compounds. As reported in Table 3, our model reliably distinguishes between the two classes on the basis of a single, highly significant semantic predictor, namely the semantic similarity between the compound constituents ( = 0.0002). In particular, the higher the similarity between the modifier and the head of a compound, the higher the probability of having a CRD, rather than an ATT compound. All other variables have been progressively removed from the final model since none of them significantly contribute to the overall goodness of fit.


Table 3: Results of the logit model opposing ATT (1) to CRD (0).

Table 4: Results of the logit model opposing CRD (1) to SUB (0).


Sandro Pezzelle & Marco Marelli

## **3.3 CRD vs SUB**

The third model opposes CRD (*comedy-drama*) to SUB (*bus-stop*) compounds. As in the other cases, the model reliably distinguishes between one class and the other on semantic bases. In particular, CRD is predicted over SUB by the degree of semantic similarity between the two constituents ( = 0.0024). The greater the similarity between the modifier and the head of a compound, the higher the probability of having a CRD rather than a SUB compound. Also, SUB is predicted over CRD by the degree of similarity between the compound and its head ( = 0.0146). That is, the head constituent contributes more to the overall meaning of SUB compounds (e.g., *stop* in *bus-stop*) than CRD compounds (e.g., *drama* in *comedy-drama*).

In addition to these semantic variables, compound length turns out to be also predictive of one class over the other. As reported in Table 4, in fact, its effect is reliable ( = 0.0459) and it indicates that longer compounds are more likely to be SUB than CRD. All other parameters were instead progressively removed.

## **3.4 Overall results**

Taken together, these results indicate that the degree of semantic similarity between the compound's constituents (i.e. the modifier and the head) is a highly reliable predictor of CRD against both other classes. As shown in the barplot in Figure 2, the higher the similarity between the constituents, the more a compound is likely to be CRD rather than either ATT ( = 0.0002) or SUB ( = 0.0024). Moreover, the semantic similarity between the compound and its head is a predictive measure of SUB over both other types, as shown in Figure 3. That is, the more the head contributes to the meaning of the overall compound, the more the compound is likely to be SUB rather than either ATT ( = 0.0355) or CRD ( = 0.0146).

In order to evaluate the predictive power of each model, we further computed the accuracy with which the items under investigation were correctly assigned to the correct classes. First, we obtained the classes predicted by each logit model. Second, we computed the accuracy of each model by dividing the number of correctly predicted items by the total number of items included in the final model. As a comparison, for each model we also computed the accuracy of a majority baseline obtained by simply dividing the number of cases of the majority class by the total number of cases involved. As reported in Table 5, the best predictive model turned out to be the one opposing CRD vs SUB (0.90 accuracy) followed by ATT vs CRD (0.85) and ATT vs SUB (0.61). These numbers were in line with the

Figure 2: Similarity between modifier and head is predictive of CRD over both ATT and SUB.

Figure 3: Similarity between the compound and its head is predictive of SUB over both ATT and CRD.

### Sandro Pezzelle & Marco Marelli

Table 5: From left to right: overall accuracy of each model as compared to the accuracy of the majority baseline, correctly predicted cases, missed cases. In brackets we report the correct class.


pattern of accuracy obtained by the majority baseline, which is sensible to the low number of CRD cases and therefore outputs higher scores for comparisons involving this class. Though our models always outperformed the baselines, the increase was noticeably lower in ATT vs SUB (+3%) compared to both CRD vs SUB (+9%) and ATT vs CRD (+10%). The limited number of items do not allow us to make any statistically reliable claim on the performance of the classifier. However, our focus is on testing whether the membership in a compound class is affected by a set of theoretically-relevant variables rather than proposing an effective classification algorithm. In this light, our results provided evidence for the effectiveness of these models. At the same time, they suggested that experimenting with more data would be desirable to further validate their power.

Besides accuracy, Table 5 reports some cases of correctly predicted and missed compounds for each of the models.

## **4 Discussion**

The present study investigated whether various, syntax-based classes of compounds (Subordinate, Attributive, Coordinate) can be described in terms of the quantitative, continuous properties of the meaning of the compounds and their

### 2 Do semantic features capture a syntactic classification of compounds?

constituents. To obtain these semantic measures, we generated cDSM representations for a list of compounds for which such classification was available. By running a series of logit models including both semantic and non-semantic factors as independent variables, we showed that our models are able to reliably capture different classes by means of semantic features.

## **4.1 On the modifier-head similarity**

In particular, we showed that Coordinate compounds like *comedy-drama* are predicted over either Subordinate (*busstop*) or Attributive (*halfprice*) by the higher semantic similarity between the head and the modifier. This finding is consistent with previous evidence from both theoretical linguistics and psychology. Within the lexical semantics approach, Lieber (2009) indeed proposed that Coordinate compounds are generated when the two constituents share almost identical "bodies" and "skeletons", that is, when the words to be combined have highly similar meanings.

Also, our finding is in line with several theories of conceptual combination, according to which Hybrid or Conjunctive interpretations would be produced by people for novel combinations which involve highly similar concepts, e.g. *moose-elephant* (see among others Wisniewski 1996). Accordingly, and consistent with our results, Relation-linking interpretations (roughly equivalent to Subordinate compounds) would be instead produced for semantically highly dissimilar pairs, e.g. *apartment-dog*. Since in our model the similarity between the constituents also distinguishes between Coordinate (Hybrid/Conjunctive) and Attributive compounds (Property-mapping), we argue that this result is consistent with the graded description proposed in many conceptual combination theories, where the difference between Property-mapping and Hybrid combinations would be due to an increasing number of both "commonalities" and "alignable differences" between the concepts to be combined (Wisniewski 1996).

## **4.2 On the semantic role of compound constituents**

Second, we showed that Subordinate compounds are predicted against Attributive on the basis of the higher similarity between the compound and either constituent. That is, in compositionally obtained Subordinate compounds both the modifier and the head contribute to a greater extent to the overall meaning than in Attributive ones. Moreover, the similarity between the compound and its head is a reliable predictor of Subordinate over both other classes.

First of all, these findings are again consistent with the lexical semantics literature (Scalise et al. 2005; Lieber 2009). In it, Subordinate compounds are typically

### Sandro Pezzelle & Marco Marelli

characterized by a structure in which the head *selects* its argument. Therefore, the head contributes more to the overall meaning in this kind of compounds compared to the other classes, where a formal relation between the elements is absent. Also, these results are consistent with the different mechanisms proposed in the conceptual combination literature for Relation-based interpretations (capitalizing on a "slot-filling" procedure) and Property-based ones (where an "alignment" process is routinely carried out) (Wisniewski & Gentner 1991; Wisniewski 1996). In a nutshell, the slot-filling procedure would imply a bigger role of the compound *head* compared to the other competing mechanism since, during combination, the head would be just *filled* in one of its "slots" by the modifier concept.

Interestingly, these findings are also consistent with evidence from embodied cognition (Louwerse 2008). In particular, the embodied conceptual combination theory (ECCo) by Lynott & Connell (2010) proposes that the great majority of relational interpretations (corresponding to Subordinate compounds) are "nondestructive", namely, they result from the combination of constituent concepts that are left intact during the meshing of their "affordances". To illustrate, in this approach the compound *picture book* (i.e. 'a book that has pictures') is nondestructive, since the *pictures* in question are still intact entities in the pages of the *book*. Simplifying somewhat, the combinatorial procedure that leads to Relation-based interpretations (Subordinate) does not modify heavily the meaning of the original constituents, whereas Property-mapping ones (Attributive) are almost always destructive, that is, they involve the "destruction" of (part of) the constituent concepts. Using an example from Lynott & Connell (2010), the compound *icicle fingers* would reduce *icicle* to a representation of 'coldness' and 'stiffness'. At the same time, the representation of the head (*fingers*) would be switched toward a more figurative, metaphorical meaning, less similar to its prototypical representation (see also, e.g., *iron curtain*). In this light, the similarity between either constituent as an independent word and the compound will be generally higher in Relation-based (Subordinate) compared to Property-based interpretations (Attributive), given that the combinatorial procedure of the former type does not heavily modify the meaning of the original constituents. Moreover, this observation provides indirect evidence that meaning representations extracted from texts via distributional semantics models can encode grounded information, at least to some extent (Louwerse 2011).

### 2 Do semantic features capture a syntactic classification of compounds?

## **4.3 On attributive compounds**

Third, compositionally-derived Attributive compounds are characterized by both a weaker contribution of the constituents in determining the overall meaning compared to Subordinate and a lower similarity between the constituents compared to Coordinate. This pattern of results is again consistent with Lieber (2009), who proposes that Attributive formations emerge when the semantic features of the constituents are too disparate to be interpreted in a Coordinative way and lack the argument structure that is typical of Subordinate compounds. Accordingly, Attributive compounds would represent a last-resort strategy used when the typical semantic features of the other classes are not satisfied (Lieber 2009). This description, according to which Attributive compounds would result when no discriminative features are present, is in line with evidence from conceptual combination showing that acceptability judgements for Property-based (Attributive) interpretations to novel compounds (e.g., a *whale boat* is 'a large boat') are slower compared to Relation-based (Subordinate) interpretations (e.g. a *whale boat* is 'a boat for hunting whales') (Gagné 2000). According to Gagné & Spalding (2015), indeed, this would suggest that Relation-based interpretations are the product of an initial compositional process that, in the absence of the features that lead to either a relational interpretation (Subordinate) or a coordinate interpretation (Coordinate), leads to Property-mapping interpretations.

## **4.4 On the methodology**

On the methodological level, it should be mentioned that we used a compositional model to generate representations for a list of compounds whose constituents were nouns, verbs, adverbs, adjectives, etc. even though in the training phase only noun-noun compounds from CELEX were used. This could have represented a weakness for the system, causing the model to be biased toward noun-noun combinations. By looking at the results, however, we observed a similar, remarkably good performance of the model in all items, regardless of the grammatical category of the constituents. This is also clear by inspecting the examples in Table 5, where it can be noted that the parts-of-speech are almost uniformly distributed. However, it might be still possible that a richer training set might lead to even better results, perhaps achieving a better performance in generating meaning representations for less systematic, more opaque compounds. Indeed, we hypothesize that the lower accuracy obtained by the model opposing Attributive vs Subordinate compared to the others might be possibly due to this issue. Finally, we believe that the effectiveness of such an approach might

### Sandro Pezzelle & Marco Marelli

be further validated by testing it on a larger (and possibly balanced with respect to compound type) set of annotated compounds. This, on the one hand, would strengthen the predictive power on the prediction task. On the other hand, it would allow more extensive, fine-grained analyses on the successes and failures of the models. We plan to further investigate this issue in future work.

## **4.5 On the effectiveness of cDSMs in predicting compound relations**

The effectiveness of our approach in the proposed task is in line with previous work showing that compositional models of distributional semantics are successful in capturing relational information between the constituents of a compound. In particular, our task is related to that of predicting compound semantic interpretation (see Section 1.5), where compositionally-obtained representations have been used to assign the correct semantic relation to noun-noun expressions. By experimenting with a number of cDSMs (including the one adopted in this study by Guevara 2010), for example, Dima (2016) obtained results comparable to stateof-the-art in 2 popular datasets (Ó Séaghdha 2007; Tratz & Hovy 2010). Compared to SoA methods, however, Dima (2016) only exploited information from word embeddings, thus proving the effectiveness of both distributed representations and compositional methods. In quantitative terms, our results are not directly comparable due to both the different experimental setting (we did not tackle the task as a classification problem) and the number of relations involved (3 vs either 6 or 43). Moreover, our results cannot be compared with previous work since, to our knowledge, we are the first in proposing this task. However, these studies jointly show that compositional representations are successful in predicting compound relations defined on either *semantic* or *syntactic* bases.

## **4.6 Final remarks**

In conclusion, this study suggests that different compound types identified on syntactic bases can be also defined in terms of continuous, quantitative features of the meaning of the compound and its constituents. We believe that discrete and continuous approaches are two faces of the same coin, the former representing a theoretically motivated, cross-linguistically valuable framework aimed at describing complex linguistic phenomena, the latter providing an interesting way to quantitatively test them. As indicated by our results, compositional models of distributional semantics present a flexible and powerful way to capture many of these phenomena.

## **Acknowledgements**

Sandro Pezzele did most of the work reported in the present article while employed at CIMeC, University of Trento. We are grateful to Marco Baroni and Laura Vanelli for their valuable feedback during the early stages of the project. We thank Sergio Scalise for providing the MorBoComp subset used in the experiment. We are also grateful to the participants of the First Quantitative Morphology Meeting (Belgrade, July 2015) for the helpful questions and discussion.

## **References**


### Sandro Pezzelle & Marco Marelli


2 Do semantic features capture a syntactic classification of compounds?


## **Chapter 3**

## **Compositionality in English deverbal compounds: The role of the head**

Gianina Iordăchioaia University of Stuttgart

Lonneke van der Plas University of Malta

## Glorianna Jagfeld

Lancaster University

This paper is concerned with the compositionality of deverbal compounds such as *budget assessment* in English. We present an interdisciplinary study on how the morphosyntactic properties of the deverbal noun head (e.g., *assessment*) can predict the interpretation of the compound, as mediated by the syntactic-semantic relationship between the non-head (e.g., *budget*) and the head. We start with Grimshaw's (1990) observation that deverbal nouns are ambiguous between compositionally interpreted argument structure nominals, which inherit verbal structure and realize arguments (e.g., *the* assessment *of the budget by the government*), and more lexicalized result nominals, which preserve no verbal properties or arguments (e.g., *The* assessment *is on the table.*). Our hypothesis is that deverbal compounds with argument structure nominal heads are fully compositional and, in our system, more easily predictable than those headed by result nominals, since their compositional make-up triggers an (unambiguous) object interpretation of the non-heads. Linguistic evidence gathered from corpora and human annotations, and evaluated with machine learning techniques supports this hypothesis. At the same time, it raises interesting discussion points on how different properties of the head contribute to the interpretation of the deverbal compound.

Gianina Iordăchioaia, Lonneke van der Plas & Glorianna Jagfeld. 2020. Compositionality in English deverbal compounds: The role of the head. In Sabine Schulte im Walde & Eva Smolka (eds.), *The role of constituents in multiword expressions: An interdisciplinary, cross-lingual perspective*, 61–106. Berlin: Language Science Press. DOI:10.5281/zenodo.3598558

Gianina Iordăchioaia, Lonneke van der Plas & Glorianna Jagfeld

## **1 Introduction**

This paper contributes a study on how constituents influence the compositionality of multiword expressions from the perspective of deverbal compounds in English with a focus on the role of their head nouns.

## **1.1 Deverbal compounds (DCs)**

DCs are noun-noun compounds with a deverbal head as illustrated in (1). In contrast to root compounds (RCs) (see 2), whose head nouns are typically simple (non-derived), DCs usually receive an interpretation in which the non-head establishes a syntactic-semantic relationship with the verb from which the deverbal noun is derived (i.e., as a direct object, subject or other argument/adjunct). RCs often receive a fixed interpretation (see 2a) or one depending on the immediate context (see 2b). *Tomato bag* in (2b) may refer to a bag of tomatoes, a bag having the shape or color of a tomato, or any other connection between a bag and tomatoes mentioned in previous context. The same holds for *jelly bottle*.


	- b. tomato bag, jelly bottle

Nominal DCs may be headed by deverbal nouns built with a variety of suffixes, including those that form participant-denoting nominals, as in (3a) for agents and in (3b) for patients (see Lieber 2016: 73). For reasons that will be given in Section 3.2, we concentrate here on DCs headed by eventive deverbal nominals as in (1), formed by means of the suffixes *-al*, *-ance*, *-(at)ion*, *-ing*, and *-ment*.

	- b. bank employ**ee**, award nomin**ee**

## **1.2 Argument structure nominals and result nominals**

Grimshaw (1990) points out that the majority of deverbal nouns exhibit an ambiguity between an argument structure nominal (ASN; her *complex event nominal*) reading, which perfectly mirrors the corresponding verb phrase with its argument structure, and a result nominal (RN) reading, which is more lexicalized and

### 3 Compositionality in English deverbal compounds: The role of the head

departs from the base verb at various degrees.<sup>1</sup> The crucial difference between the two originates in the availability of verbal event structure, which enforces and constrains argument realization in ASNs (see (6) below), and its absence in RNs. The examples in (4) illustrate the two readings, building on Grimshaw (1990: 49).

	- b. The **examination**/\*exam *of the patients* took a long time. (ASN)
	- c. \* The **examination** *of the patients* was [on the table/in the bag]. (ASN)

In the absence of the object argument *of the patients*, the noun *examination* receives an RN reading, in which, similarly to *exam*, it denotes a concrete entity, which can lie on a table or be in a bag (see 4a). When the argument is realized, the synonymy with *exam* is lost, and the noun behaves like a nominalized verb, expressing an event, which can take a long time (see 4b), but cannot be on a table or in a bag (see 4c). In combination with *exam*, the phrase *of the patients* in (4b) could receive a possessive interpretation, i.e., the exam that belongs to the patients, but not that of an object argument of an examining event, since *exam* lacks such a reading. A similar interpretation would be possible in (4c) with *examination* on its RN reading.<sup>2</sup>

## **1.3 Compositionality and transparency in deverbal compounds**

Compositionality has long been a prominent issue in theoretical linguistics with a first formalization offered in Montague's (1970) *Universal Grammar*. A simple formulation of the principle of compositionality in this tradition is given in (5).

(5) The principle of compositionality (PoC, Partee 1984: 281) The meaning of an expression is a function of the meanings of its parts and of the way they are syntactically combined.

According to the PoC in (5), the interpretation of a complex expression relies on the individual meanings of its parts and their syntactic combination. Leaving technical details aside, an expression like *to kick the bucket* will be interpreted compositionally from the meanings of the verb *to kick* and of the noun phrase *the bucket*, via a verb–direct object syntactic relationship and the corresponding

<sup>1</sup> For the sake of simplicity, we leave aside Grimshaw's third possible reading of deverbal nouns as simple event nominals, since, from the perspective of the properties we consider here, they pattern with RNs and contrast with ASNs in similar ways.

<sup>2</sup> In her examples, Grimshaw strictly uses *of the patients* on its argument interpretation.

### Gianina Iordăchioaia, Lonneke van der Plas & Glorianna Jagfeld

semantic relation. On this compositional reading, this expression is semantically transparent both with respect to the meanings of the parts and the syntacticsemantic relationship: the object *the bucket* is semantically interpreted as a patient of the kicking. However, *to kick the bucket* also has the idiomatic reading *to die*, on which neither the meanings of the two parts, nor any syntactic relationship between them can be compositionally retrieved. There is nothing particular about kicking or buckets or the verb–direct object relationship between them to be found in the meaning of *to die*. This reading is non-compositional and opaque.

Some idiomatic expressions, however, may be partially compositional. For instance, in *to spill the beans* 'to divulge a secret', the verb–object relationship is preserved in the idiom meaning and, while the object *beans* is lexico-semantically unrelated to *secret*, the verb *to spill* shares lexical semantic properties with *to divulge* (i.e., 'to let out'), which can be viewed as its figurative meaning. In this expression, the non-head is opaque, the head is partially transparent, and the relationship is compositional and transparent. The head is only partially transparent because it is ambiguous and the meaning *divulge* is not its basic meaning.

Deverbal compounds offer another pattern of expressions that are not fully compositional – yet, one different from the idioms above. The interpretation of DCs usually relies on a syntactic-semantic relationship between the base verb of their head noun and their non-heads, as shown in (1). Unlike in the corresponding verbal phrases, however, the syntactic relationship is not overt in DCs: e.g., *budget* in the DC *budget assessment* is not marked with accusative case as in the corresponding verb phrase in (1a), and *police* in *police questioning* is not marked by nominative case in (1b).<sup>3</sup> In the absence of overt marking, it is often unclear how to interpret the non-head of a DC, as, for instance, in *police killing*, where *police* could be either the object or the subject of *kill*. The indeterminacy of the syntactic relationship leads to ambiguity, which reduces the transparency of DCs from the perspective of syntactic compositionality, even though the meanings of the parts are transparent (by contrast with *beans* in *to spill the beans* or *kick* and *bucket* in the idiom *to kick the bucket*).

Yet, following the PoC and the compositional make-up of a sentence, if a particular DC is built up compositionally in parallel to the corresponding verbal phrase, then an object interpretation of the non-head is expected. This is the thesis we will follow and support here. But why does an object non-head indicate compositionality and, e.g., a subject does not? The reason follows from simple sentence structure. Transitive verbs form immediate constituents with their direct objects

<sup>3</sup> In a morphologically poor language like English, case marking comes from the syntactic position of the noun phrase, which is also missing in DCs, given their fixed word order.

### 3 Compositionality in English deverbal compounds: The role of the head

but not with their subjects, which is why in sentence structure we first form a VP from the verb and its object, and the subject attaches afterwards, usually under a different projection such as VoiceP (or little vP), as in (6) (see Chomsky 1995 and Kratzer 1996 for a discussion on the differences between objects and subjects with respect to the event structure of verbs).

A DC based on the construction in (6) contains *two* nouns: one is the head derived from the verb and the other is the non-head. The latter can realize only one of the two arguments of the verb. Given the hierarchical structure in (6), this must be the object: see *budget assessment*. Nothing prevents the original subject from being realized as a non-head (e.g., *government assessment*). In that case, however, the DC does not follow the compositional make-up in (6), since the object is missing and the subject cannot form a constituent with the transitive verb alone. Such a DC will be interpreted by means of world knowledge, similarly to RCs as in (2). From this perspective, the subject behaves just like an adjunct/modifier, since it does not play any role in the compositional make-up of the DC.<sup>4</sup>

The importance of compositionality in language use is undebatable: without recursive compositional rules, speakers would not be able to produce and understand infinitely many sentences (Dowty 2007). That compositionality in DCs imposes an object interpretation, as predicted by the structure in (6), is supported by the fact that the default reading of a possibly ambiguous DC like *police killing* is that with an object non-head; the subject reading becomes available if established by a particular context, as, e.g., recent discussion in the U.S. about police killing unarmed civilians. Similarly, out of context, *student evaluation* also receives an object interpretation. The subject reading is brought about by a particular social environment in which people talk about students evaluating their teachers. Moreover, as shown in the linguistic literature (Grimshaw 1990; Borer

<sup>4</sup> Indirect objects are included here as well, since they also attach to the verb after the direct object does: see Larson (1988).

### Gianina Iordăchioaia, Lonneke van der Plas & Glorianna Jagfeld

2013; Iordăchioaia et al. 2017), if a DC type is compositionally derived from a VP, it should also be fully productive: that is, any verb–object combination should be able to form a compositional DC, which is confirmed, for instance, by (7b). By contrast, not any subject–verb combination can form a DC: the non-heads in (7c) may at best receive a peculiar object interpretation, but not the subject reading of the corresponding sentence in (7a).

	- b. **window** breaking, **pen** breaking
	- c. \* *boy* breaking, \**girl* breaking

To summarize, ambiguous DCs as in (8) below are partially opaque, as the relationship between the two nouns is not explicit, and may receive several interpretations. However, if the DC is interpreted compositionally (in parallel to the verbal construction), it will be fully transparent and involve an object reading. The task remains to find independent evidence for the compositionality of a DC. In this respect, we will follow Grimshaw's (1990) distinction from Section 1.2 concerning the head nouns of DCs, as specified below in Section 1.5 and Section 4.1.


## **1.4 Terminology**

Before we introduce our research program, a few terminological clarifications are in order. The term *compositionality* is often used without particular focus on the syntactic-semantic relationship between the parts of the complex expression, an aspect that is of crucial importance in our study. Natural Language Processing literature on (root) noun-noun compounds, for instance, occasionally speaks of *compositionality ratings*, in which annotators evaluate how accessible the lexical meaning of the two nouns is in the overall meaning of the compound (see Section 2.2.2 for details and references). This notion of compositionality is similar to what we call *lexico-semantic transparency* below.

A notion of *compositionality* that is closer to ours appears in some Distributional Semantics (DS) approaches, which, in view of the PoC in (5), seek to identify linguistically-informed composite functions to combine the individual parts of complex expressions (Marelli & Baroni 2015; Baroni & Zamparelli 2010). Like

### 3 Compositionality in English deverbal compounds: The role of the head

us, these authors take a closer look at the relationship between the parts; however, their focus is more on the technical implementation (i.e., the DS correspondent of function application from theoretical linguistics) rather than on the linguistically relevant constraints that are at play. Although we share the interest in the relationship between the parts with this literature, we are not concerned with the technical details of the function, but with how this relationship interacts with other morphosyntactic properties of the head, as explained in Section 1.5.

We use the terminology as follows: *compositional* refers to DCs that encode the structure in (6). Some may call this "syntactic compositionality". The term *transparent* is broader and allows two specifications. First, *lexico-semantically transparent* characterizes compounds whose parts are semantically fully recoverable from the compound meaning. These include all DCs as in (1) and (8), as well as some RCs like those in (2).<sup>5</sup> Second, what we would call *compositionally transparent* applies to DCs that, besides being lexico-semantically transparent, also follow the structure in (6). These correspond to our *compositional DCs*, since all the DCs we consider here are lexico-semantically transparent.

## **1.5 Our contribution**

We start with the assumption that an important source of ambiguity in DCs such as in (8) is the ambiguity of their deverbal head nouns as in (4) and the correlated ambiguous relationship that they establish with the non-heads. The non-head is entirely transparent in DCs: its lexical semantics is present in the DC meaning, and, as an argument or adjunct, it brings no syntactic constraints to influence its syntactic-semantic relationship with the head noun. By contrast, the head noun is more complex. Its lexical semantics is also visible in the DC; yet, following Grimshaw's distinction in (4), its ambiguity between ASN and RN readings has a great impact on its syntactic-semantic relationship with the non-head. As perfect transpositions of verb phrases, ASNs follow the compositional structure in (6) and require objects to be realized first. RNs maintain only remote lexical connections to the verb base and do not inherit their compositional structure. Thus, RNs impose no syntactic requirements on the non-heads and are compatible with any syntactic-semantic relationship allowed by their lexical semantics.

Following this reasoning, our hypothesis is that DCs with ASN heads will obey the constituent structure in (6) and realize only objects as non-heads. These DCs will be both compositional and lexico-semantically transparent. DCs whose

<sup>5</sup>Other RCs like *hogwash* are substantially less transparent: see the previous literature in Section 2.2.

### Gianina Iordăchioaia, Lonneke van der Plas & Glorianna Jagfeld

heads are RNs do not respect this structural condition and allow any interpretation that a context or world knowledge provide – whether related to the base verb or not (cf. *police building* 'building that hosts the police department'). In this respect, DCs headed by RNs are semantically similar to RCs; their deverbal morphology is irrelevant for their interpretation, since they are lexicalized. Such DCs are lexico-semantically transparent, but they are not fully compositional.

To test this hypothesis, we use a series of morphosyntactic properties that Grimshaw argued to be ASN-specific (see Section 4.1) and check their presence in the behavior of DC heads, on the basis of evidence from a large corpus of naturally occurring text. Since it is not a given fact that the ASN-features defined by Grimshaw can be reliably informed by corpora, we also gathered human judgments on ASN-hood – namely, we asked annotators to indicate to what extent the deverbal head refers to a process (or verbal event). By asking annotators directly for their judgments, we try to get an estimate for the latent variable that underlies the ASN properties defined by Grimshaw. We use these different types of data as features in a logistic regression classifier, by which we aim to predict the syntactic-semantic relation between the head and the non-head. These results are compared with the manually annotated interpretation of DCs.

Given our hypothesis and methodology, we expect that the ASN-features extracted from the corpus, as well as that based on human judgments, will point to an object interpretation of the DC (as predicted by 6) and will have high predictive power in determining whether the DC's non-head is an object or not. A high predictive power of the features will additionally show us that compositionality is an important aspect in the disambiguation of DCs.

First of all, our results indicate that all the ASN-hood features have predictive power above the chance level when tested individually and together. The most stable individual features point to an object interpretation, as expected under our hypothesis. Second, the ablation experiments show that many features overlap in the identification of ASN-hood, inviting to theoretical reflection on the individual contribution of these features. Third, the best feature is the manual annotation of ASN-hood, which confirms the importance of this property for interpreting DCs; it also indicates that either the morphosyntactic features are comparatively weaker or our corpus did not offer enough material for better results. Fourth, some weaker features raise stimulating questions especially relevant for linguistic investigation.

Our study investigates transparency strictly from the perspective of the compositional structure in (6). The degree of (lexico-semantic) transparency of DCs that do not receive such a verb-related compositional interpretation (i.e., those headed by RNs) goes beyond the scope of our present study and must be left for a future endeavor. As mentioned above, the role of world knowledge and context

### 3 Compositionality in English deverbal compounds: The role of the head

is essential for such DCs. Therefore, such an investigation would need to employ a different methodology, more similar to that pursued in several computational studies as presented in Section 2.2.2. We also do not aim to measure speaker intuitions about the transparency degrees of DCs (as done in some of these computational approaches), although it would be interesting to compare such ratings with our relation-based annotations in the future. Our present study conceptually differs from these computational approaches, as it addresses the transparency of DCs from a structural perspective. We use insights from theoretical linguistics on the morphosyntactic properties of the deverbal noun heads of DCs and general principles of syntax-semantics mapping, and test these theoretical hypotheses with corpus-based and computational methods.

We start with an overview of relevant previous studies from theoretical linguistics (TL) and natural language processing (NLP) in Section 2. Sections 3 and 4 describe our data collection and methodology; Section 5 presents our experiments, followed by a discussion in Section 6. We draw our conclusions in Section 7.

## **2 Previous literature**

In Section 2.1 we introduce the main theoretical concepts that have guided our investigation and briefly refer to previous analyses of DCs relevant to our assumptions. Section 2.2 presents the NLP literature on deverbal and root noun-noun compounds and the extent to which these studies can be compared with ours.

## **2.1 Theoretical approaches to DCs**

Deverbal compounds have been at the forefront of theoretical linguistics since the early days of generative grammar. Especially beginning with the 1970s, after Chomsky's (1970) *Remarks on nominalization*, the theme of the theoretical debate has been whether word formation is part of the syntax or the lexicon. Syntactic approaches have argued that DCs behave systematically enough to be accounted for by syntactic rules (Roeper & Siegel 1978; Ackema & Neeleman 2004); lexicalist approaches have pointed out peculiar properties of DCs, which would require their analysis as part of the lexicon (Selkirk 1982; Lieber 2004).

The syntax vs. lexicon debate is relevant for our study in so far as recognizing a syntactic component in DCs leads to their compositional analysis, while specifying lexical rules for them suggests that they are like RCs and lack a systematic morphosyntax that preserves phrase-like compositionality. Meanwhile, both theoretical trends have argued for both kinds of analysis of DCs, and we will abstract away from the type of framework to focus on the properties of DCs.

### Gianina Iordăchioaia, Lonneke van der Plas & Glorianna Jagfeld

Noteworthily, in theoretical studies the problem of compositionality in DCs is not addressed with respect to the contribution of the two individual nouns as done in recent NLP studies (see Section 2.2). If available at all, implications on compositionality come indirectly from the claims on the make-up of DCs and the structural relationship between their parts as in (6) (see Section 2.1.2).

### **2.1.1 Morphosyntactic properties of ASNs**

In support of the contrast illustrated in (4), Grimshaw (1990) argues that deverbal nouns in their ASN reading exhibit a special morphosyntactic behavior, which is not shared by RNs. Table 1 is a summary of the main contrastive properties of ASNs (vs. RNs) from Grimshaw (1990) that are relevant for our study, adapted from Alexiadou & Grimshaw (2008: 3). These properties are positively specified for ASNs only, since RNs behave like non-derived lexical nouns and do not present any such particularities. The reasoning is that ASNs have verbal properties (i.e., event structure as in 6), which will impose restrictions on their nominal behavior (e.g., must appear in the singular) or make them compatible with verb-specific modifiers (e.g., aspectual adverbials).

Table 1: Morphosyntactic properties of ASNs vs. RNs


The realization of object arguments is a necessary and sufficient condition for ASNs. It indicates the presence of verbal event structure, which associates with the other ASN-properties. However, the morphosyntactic means to introduce an object argument in nominals is an *of* -phrase, which may also express possession. Given this ambiguity, using an *of* -phrase in combination with other ASN-properties is more reliable. For instance, in (4b), the predicate *took a long time* requires an event as a subject, which shows that *the examination of the patients* is an ASN, while *the exam of the patients* is not. As mentioned above, in the latter case *of the patients* expresses a possessor of the entity *exam*.

### 3 Compositionality in English deverbal compounds: The role of the head

Agent-oriented adjectives like *deliberate, intentional, careful* are also taken by Grimshaw (1990: 51–52) to depict ASNs. Like *of* -phrases, possessive marking is ambiguous between expressing subject arguments, as in (9b), and possessive modifiers, as in (9c). Agentive modifiers, however, require verbal event structure with a subject (agent) argument, which cannot be available in the absence of the object argument in (9a) and (9c) (cf. the hierarchy in 6). The contrast between (9a) and (9b) shows that the possessive *the instructor's* cannot introduce the subject argument, if the object argument is not realized.

	- b. The instructor's *intentional/deliberate* examination *of the papers* took a long time. (ASN)
	- c. the instructor's (*\*intentional/\*deliberate*) book

In ASNs, *by*-phrases have a function similar to that of the possessive in (9b): they introduce the subject argument. Yet, like the possessive and *of* -phrases, *by*phrases may also introduce modifiers. In (10a), the *by*-phrase acts as a modifier of the lexical noun *book*, which has no event structure. In (10b), however, it introduces the subject argument of an ASN, the same way the possessive does in (9b). (10c) is ungrammatical, because the agent-oriented modifiers *intentional/deliberate* require a subject argument, which the *by*-phrase cannot introduce in the absence of event structure and the object: (10c) parallels (9a).

	- b. The intentional/deliberate examination *of the papers by the instructor* took a long time. (ASN)
	- c. \* The intentional/deliberate examination *by the instructor* took a long time.

Given the verbal event structure and the correlated aspectual properties of ASNs, they are expected to allow aspectual adverbials and to obey the aspectual restrictions of their base verbs. In (11a), the telic verb *destroy* allows*in*- but not *for*adverbials. The correlated ASN in (11b) exhibits the same constraint. By contrast, simple nouns that lexically denote events such as *trip, process* are incompatible with such modifiers in (11c), although they occupy time, as shown by (11d). The latter pattern with RNs (Grimshaw 1990: 58–59).

	- b. The total destruction of the city *in/\*for only 2 days* appalled everyone.

### Gianina Iordăchioaia, Lonneke van der Plas & Glorianna Jagfeld


Finally, Grimshaw argues that, due to their verbal structure, ASNs, in general, disallow plural marking, and when plural is available it indicates an RN reading. This is illustrated in (12) from Grimshaw (1990: 54). Related to this and the aspectual contrast in (11), Grimshaw notes that aspectual modifiers like *constant, frequent* will combine with a singular ASN, but with a plural RN. These modifiers require habitual/iterative aspect, which is made available by the event structure of ASNs, but not by the lexicalized RNs. The latter need the plural to contribute the iterative meaning: see (13a)/(13b–c).


In (9) to (13), the contrasts between ASNs and RNs are clear. Yet, depending on the lexical semantics of the individual nouns, the application of these tests may exhibit quite a bit of variation, which led many to challenge Grimshaw's generalizations. For instance, Alexiadou et al. (2010) show that in some languages, ASNs may pluralize provided particular aspectual properties, while Grimm & Mc-Nally (2013) and Lieber (2016) challenge some of Grimshaw's claims with counterexamples attested in corpora. However, a general tendency of ASNs to exhibit the properties in Table 1 cannot be denied. At least so far, no corpus study has offered a quantitative analysis to prove that these properties are irrelevant for ASNs. From this perspective, our study can also be viewed as testing the relevance of these properties on the basis of deverbal compounds, which, according to Grimshaw, are headed by ASNs (see Section 2.1.2).

### **2.1.2 Deverbal compounds between ASNs and RNs: Grimshaw (1990)**

Let us now consider DCs from the perspective of the documented ASN vs. RN contrast. We focus on Grimshaw's analysis of DCs and on Borer (2013), the latter of which reviews Grimshaw's arguments to support an opposite position.

### 3 Compositionality in English deverbal compounds: The role of the head

In her study of nominalization, Grimshaw (1990) argues that the heads of DCs (i.e., her *synthetic compounds*) are ASNs. Her reasoning relies on the observation that DCs obey argument structure constraints in the realization of their nonheads. In her model of argument realization, she proposes the hierarchy of argument roles in (14), such that the lower arguments (from right to left) must be realized syntactically before the higher ones. This means that the theme, i.e., the syntactic direct object, must be realized before the goal (indirect object) and the agent (subject). This thematic hierarchy reminds us of the constituent structure of verb phrases in (6).

(14) Agent (subject) > Goal (indirect object) > Theme (direct object)

Grimshaw argues that DCs obey the hierarchy in (14), since they disallow nonheads that realize other arguments than the theme (object). (15) repeats two of her examples. Her explanation is that, when occurring in DCs, deverbal nouns such as *giving* and *reading* are disambiguated to an ASN interpretation.

	- b. *Students* read **books**. DC: **book**-reading by students vs. \**student*-reading of books

In contrast to suffix-based deverbal nouns as in (15), she considers zero-derived nouns like *a sting* and *a bite* to always be RNs. She shows that the compounds these may head need not obey the hierarchy in (14) and allow agent non-heads. The grammatical compounds in (16) are RCs for Grimshaw.

(16) **bee** sting (vs. \*bee-stinging), **dog** bite (vs. \*dog-biting)

### **2.1.3 Deverbal compounds between ASNs and RNs: Borer (2013)**

In spite of her extensive study on ASNs, Grimshaw does not go to great lengths to compare DCs with ASNs in terms of morphosyntactic properties such as those in Table 1. Di Sciullo (1992) investigates some of these tests in further support of the similarity between DC heads and ASNs. However, two decades later, Borer (2013) challenges Grimshaw's analysis of DCs by using some of these morphosyntactic tests. She argues that the behavior of DCs essentially differs from that of ASNs, and proposes that all DCs are headed by RNs.

We retain three of Borer's arguments. First, she argues that, unlike ASNs, DCs disallow aspectual *in*/*for*-adverbials and, second, that they also disallow argumental *by*-phrases. This contrast is illustrated in (17) (cf. 11 and 10). In Borer's

### Gianina Iordăchioaia, Lonneke van der Plas & Glorianna Jagfeld

system, the unavailability of aspectual modifiers indicates that event structure (with arguments) is entirely missing from DCs, so they cannot involve ASNs. Her conclusion is that DCs are headed by RNs and behave just like RCs.


able as a subject reading, depending on context. As evidence, she quotes DCs as in (18), parallel to those in (1b), whose non-heads may correspond to subjects.

### (18) teacher recommendation, court investigation, government decision

Some criticism and re-interpretation of Borer's facts is found in Iordăchioaia et al. (2017) and Iordăchioaia (to appear). We briefly note here that aspectual adverbials are barely ever attested in corpora even with ASNs (Lieber 2016: 39– 42), so an extensive empirical study is necessary to determine how much DCs differ from ASNs in this respect. Furthermore, *by*-phrases are broadly attested with DCs in corpora, as Grimshaw's (15b) also predicts, but they usually involve bare plurals and not definite noun phrases or proper names as in Borer's (17c–d). Given that DCs are often generic, this restriction is natural.

Having summarized these two theoretical approaches to DCs, we may add that we do not aim to argue for one or the other. Instead, we use morphosyntactic properties whose pertinence for ASN-hood is accepted by both to guide us in evaluating the impact of the head noun on the interpretation of the DC. Our hypothesis that a high level of ASN-hood in DC heads correlates with an object reading of the non-heads, however, follows Grimshaw's intuition that "true" DCs involve ASN heads and are fully compositional. By contrast, Borer's claim is that DCs are always ambiguous like RCs and never as compositional as ASNs. Given that our results support the correlation between ASN-properties and an object reading in DCs, they also bring some evidence against Borer's analysis.

## **2.2 Computational approaches to compounds**

Compounds have been the focus of quite a number of papers in the field of computational linguistics (CL) and NLP. In view of the topic of this paper there are two strands of research that are most relevant. The first focuses on determining the relation between the two components of a compound, the head and the non-head.

### 3 Compositionality in English deverbal compounds: The role of the head

For our study this work is relevant to the extent that it discusses compounds whose head is a deverbal noun. The second strand of research is concerned with modeling the lexico-semantic transparency of noun-noun compounds. We will start by discussing the former and finish with an overview of the work that predicts the degree of transparency in compounds.

### **2.2.1 Predicting the interpretation of deverbal compounds**

The goal of computational work on deverbal compounds (referred to as nominalizations) has been to predict the relation between the non-head and the deverbal head. The relation inventory has varied from two classes, obj and subj, in Lapata (2002), to three classes, obj, subj and prepositional complement in Nicholson & Baldwin (2006), and to 13 classes – obj, subj and further specifications of the prepositional complement in Grover et al. (2005).

These works have mostly focused on encyclopedic, usage-based features such as the syntactic relations attested between the base verb of the head noun and the non-head in large corpora. The underlying assumption is that the frequency distribution of syntactic relations between a given noun and a verb, for example, between *taxi* and *drive*, is a good estimate for the distribution of the underlying relation between *taxi* and *driver*. Additional pragmatic knowledge is obtained from the direct context of the compound. In selecting these pragmatic features, these works are in line with lexicalist theoretical approaches that list several covert semantic relations typically available in compounds (cf. most notably, Levi 1978; see Fokkens 2007, for a critical overview). In addition to these pragmatic features, some straightforward morphological features are selected, such as the suffix of non-heads ending in *-ee* and *-er* (Lapata 2002).

Our study differs from these works in several ways. First, our aim is not to reach state-of-the-art performance in prediction, but to test linguistic hypotheses by measuring the predictive power of the various features discussed in theoretical linguistics, which are also indicative of the compositionality of the compound.

Second, and related to the previous point, our features are all head-specific. This is because, following Grimshaw's theory, the behavior of the derived nominal heads (as ASNs or RNs) should mirror the structural correlation between DCs and the compositional structure of the original verb. The presence (or absence) of such a correlation is expected to have a great impact on the relation between the head and the non-head. In order to measure the individual impact of these theoretically-defined features, we do not rely on pragmatic features that involve both the head and the non-head as in the studies above.

### Gianina Iordăchioaia, Lonneke van der Plas & Glorianna Jagfeld

Lastly, because our goal is to uncover in how far the behavior of the derived nominals (as ASNs or RNs) can predict the relation between head and non-head, we carefully selected equal numbers of DCs with the suffixes *-al*, *-ance*, *-ing*, *-ion*, and *-ment*. These suffixes are all ambiguous in their formation of ASNs and RNs, so we eliminate any bias for particular readings (cf. *-ee* and *-er*, Section 3.2).

### **2.2.2 Predicting the degree of transparency in noun-noun compounds**

For the transparency of compounds two types of CL work are relevant, which focus on different tasks, but share the same assumptions. One type aims to predict the meaning of compounds based on composite functions between the vectorbased representations of their parts, e.g., Ó Séaghdha (2008) and Mitchell & Lapata (2010). These works compare different types of mathematical functions for the combination of the vectors for heads and non-heads to best represent the meaning of compounds. In the same spirit, but closer to our interest in the syntactic-semantic relationship between the parts, Marelli & Baroni (2015) and Baroni & Zamparelli (2010) investigate linguistically-informed composite functions.

The other line of work aims to predict the degree of lexico-semantic transparency (i.e., what they call "compositionality"; cf. Section 1.3) of compounds. For this, they compare the vector-based representations of the parts and composite functions to the vector-based representations of the compound as a whole, e.g., Schulte im Walde, Hätty & Bott (2016); Reddy et al. (2011).

This second line of work also draws upon psycholinguistic insights, such as Libben et al. (1997; 2003), which groups noun-noun compounds into four different categories, depending on the transparency of the head and the non-head. The four classes are: tt for compounds with both a transparent head and non-head, oo for compounds with opaque heads and non-heads, and ot and to for compounds whose parts differ along the dimension of transparency. They found that both semantically opaque and semantically transparent compounds show morphological constituency. However, they found the semantic transparency of the head to play a significant role. This confirms previous results from the psycholinguistic literature (Zwitserlood 1994).

In this literature, several datasets have been created, which collect human ratings on the degrees of lexico-semantic transparency of compounds with respect to their constituents: e.g., in English (Reddy et al. 2011; Juhasz et al. 2015) and in German (Schulte im Walde, Hätty, Bott & Khvtisavrishvili 2016). Schulte im Walde, Hätty, Bott & Khvtisavrishvili (2016) have enriched the semantic transparency ratings with several empirical features related to the constituents of

### 3 Compositionality in English deverbal compounds: The role of the head

the compound in order to measure the influence of these features on the transparency of the compound. These features include:


Schulte im Walde, Hätty & Bott (2016) use vector space models to model the meaning of the compounds and their parts. Subsequently, they model the transparency of the compound by measuring the distance between the composite vector of its parts and the vector for the actual compound. The assumption behind this work is that the vectors of transparent compounds should be closer to the composite function of the vectors of their parts than the vectors of opaque compounds.

The main question Schulte im Walde, Hätty & Bott (2016) try to answer is whether the above-mentioned properties (frequency of the compound and its parts, productivity, and ambiguity of its parts) play a major role in the quality of the predictions. They found that for the head all properties had a significant effect on the predictions, whereas for the modifier the effect was not consistent. This converges with our results in predicting the compositionality of DCs from the properties of the head.

Furthermore, they attribute the influence of these features to the underlying ambiguity that they seem to be correlated with: e.g., frequent heads that are highly productive are often highly ambiguous. We note, however, that these studies are not concerned with DCs, as ours is, but especially with what we call RCs, some of which are lexico-semantically less transparent than our DCs (cf. *hogwash*).

## **3 Methodology**

In this section we present the corpus and the tools for automatic pre-processing, the procedure in the DC extraction, as well as the annotation and post-processing of our collection of DCs.

Gianina Iordăchioaia, Lonneke van der Plas & Glorianna Jagfeld

## **3.1 Corpus and tools**

For the selection of DCs and to gather corpus statistics on them, we exploited the Annotated Gigaword corpus (Napoles et al. 2012), one of the largest generaldomain English corpora, which contains several layers of linguistic annotation. This corpus encompasses ten million documents from seven news sources and more than four billion words. We made use of the following available automatic preprocessing steps and annotations, which we accessed via the Java API provided along with the corpus: sentence segmentation (Gillick 2009), tokenization, lemmatization and POS tags (Stanford's CoreNLP toolkit<sup>6</sup> ), and constituency parses (Huang et al. 2010) converted to syntactic dependency trees with Stanford's CoreNLP toolkit. The POS tags adhere to the Penn Treebank tagset (Santorini 1990); the dependency relations follow the Stanford typed dependencies (de Marneffe & Manning 2008). As news outlets often repeat news items in subsequent news streams, the corpus contains a considerable amount of duplication. To improve the reliability of our corpus counts, we removed exact duplicate sentences within each of the 1010 corpus files, reducing the corpus size by 16%.

## **3.2 Extraction of deverbal compounds**

We created a balanced collection of DCs, which we extracted from the Gigaword corpus. We first gathered 25 nouns (over three frequency bands: high, medium, low) for each of the suffixes*-al*, *-ance*, *-ion*, *-ing*, and *-ment*. The highest frequency band ranges from 4.5 to 3.5 on the Zipf-scale (van Heuven et al. 2014), the medium frequency band ranges from 3 to 2.5, and the lowest one from 2 to 1.5. The suffixes may form both ASNs and RNs according to Grimshaw (1990).

We did not consider zero-derived nouns like *attack*, *abuse*, *bite*, because Grimshaw considers them RNs (see 16). We also excluded deverbal nouns based on the suffixes *-er* and *-ee*, as they denote event participants corresponding to the subject and the object of the base verb, respectively, implicitly blocking this interpretation on the non-head (cf. *police trainee* – *dog trainer*). In our attempt to capture the closeness of DCs to ASNs (and the base verbs), we considered only the suffixes that build eventive nominals, which could realize both a subject and an object argument. DCs headed by *-ee* and *-er* nouns would have been biased for one or the other. However, our selection of suffixes represents the large majority of deverbal nouns. They make up 69.4% of the total number of deverbal nouns in the NOMLEX database (Macleod et al. 1998), which consists of 1025 lexicalized deverbal nouns.

<sup>6</sup>http://nlp.stanford.edu/software/corenlp.shtml

### 3 Compositionality in English deverbal compounds: The role of the head

The nouns were selected such that their base verbs present transitive uses, making both subjects and objects available.<sup>7</sup> For illustration, Table 2 offers samples of deverbal nouns per each frequency range and suffix. For each such selected noun we then extracted the 25 most frequent compounds that they appeared as heads of, where available. A few deverbal nouns (in particular those with suffixes *-al* and *-ance*) were less productive in compounds and appeared with fewer than 25 different non-heads. Given these gaps and after removing a few repetitions due to capitalization, we obtained a collection of 3111 DCs.


## **3.3 Annotation and post-processing of DCs**

### **3.3.1 Interpretation of (non-heads in) DCs**

All DCs were annotated by three trained American English speakers, who had a university level background in linguistics. They had to label the DCs as obj(ect), subj(ect), other, or error, depending on the syntactic relationship that they considered the DC to establish between the base verb of the head noun and the nonhead. For instance, DCs such as in (1) would be labeled as obj (1a), subj (1b), and other (1c). other was an umbrella label for prepositional objects (e.g., *adoption counseling* 'somebody counsels somebody *on adoption*'), various adjuncts (e.g., *ultrasound examination* 'to examine somebody *with an ultrasound*', *sea burial* 'to bury somebody *by the sea*', *surprise arrival* 'somebody/something arrived *by surprise*'). error was intended to identify errors of the POS tagger (e.g., *face abandonment* originates in 'they face abandonment'), but was also employed by the

<sup>7</sup>*Arrive* is the only intransitive unaccusative verb that realizes the object/internal argument as a subject.

### Gianina Iordăchioaia, Lonneke van der Plas & Glorianna Jagfeld

annotators when they considered the DC uninterpretable or ungrammatical. We allowed the annotators to use multiple labels and to indicate ambiguity (using "–") and the preferred order of the readings (using ">").

We used the original annotations to create a final list of compounds with the labels that all three annotators agreed on. For ambiguously labeled DCs we selected the one reading available for all three. If they all indicated the same ambiguity for a DC, we labeled the DC as ambiguous. The labels we used for the final dataset are obj, subj, other, dis(agreement between annotators), ambig(uous), and error. In spite of Borer's (2013) claims, we found only two cases of ambiguity which all three annotators agreed on – namely, *police killing* and *doctor referral*, which were both labeled subj–obj. In the end we identified 772 dis, 1377 obj, 404 other, 286 subj, and 270 error cases of DCs. After removing the disagreements, the two ambiguous DCs and the errors, we obtained 2067 DCs. We based our study on the agreed-upon relations only. We note, however, that the simple inter-annotator agreement (IAA) among the three annotators, excluding the errors, was 72.8%. In a previous study with only two annotators (Iordăchioaia et al. 2016), the IAA was 81.5%.

We kept two versions of the data: one in which the classes other and subj are separate and one in which we conflated them to nobj (non-object). Given the purpose of this paper, i.e., verifying to what extent the obj reading of a DC correlates with particular morphosyntactic properties of the head noun, we focus here on the binary classification. The resulting data set is skewed with obj prevailing: 1377 obj and 690 nobj.

### **3.3.2 Process vs. result readings in DCs**

An additional annotation task concerned feature "7. *process-vs-result*" from Table 3 in Section 4.1. This feature was designed to capture the three annotators' judgments with respect to how close the interpretation of the DC comes to the ASN and the verbal expression of a process/event in which the non-head is realized as subj, obj, or other. They had to rate DCs from 5 (very prominent process) to 1 (no process = result) (see Grimshaw 1990).

We first explained the difference between an ASN and an RN to them as follows: "*The teacher's assignment of tasks* expresses a process in which the teacher assigns tasks. However, in *this long assignment took several hours to complete*, the noun *assignment* is interpreted as a result of the process of assigning something – namely, the task itself." We then instructed the annotators to check this contrast in DCs like *task assignment* and *Math assignment* and rate the ones that relate to the process as closer to 5 and those that relate to the result as closer to 1. Another example was *apartment building*, which should be rated as closer to 5,

### 3 Compositionality in English deverbal compounds: The role of the head

if they interpret it as 'to build apartments', and closer to 1, if they interpret it as 'a building with apartments'. We fully encouraged the annotators to employ the scores 4, 3, 2 for unclear cases.

During this task, the annotators had access to their previous subj/obj/other annotation labels for each DC and could compare different DCs headed by the same head noun. In terms of the variation of ratings between DCs headed by the same noun, one annotator in particular assigned pretty similar scores, although the contrast was clear. This annotator also showed a tendency towards the extremes: either 5 or 1. In general, the task was perceived as difficult, especially by this annotator. We multiplied the scores from 5 to 1 by 20 to use them as percentages. For each DC we calculated the average between the three annotations obtaining values between 20 and 100.

## **4 Feature selection**

## **4.1 Theoretical considerations**

To collect information on the properties of the head nouns in DCs, we defined a total of nine features, given in Table 3.

The first seven features are inspired by Grimshaw (1990), although only the first four directly correspond to the properties in Section 2.1.1. Two adjustments led us to four features instead of the six properties in Table 1: first, *in*/*for*-adverbials were discarded, because we found close to no relevant data; second, we counted agent-oriented and aspectual adjectives together, as they were also very few.<sup>8</sup> In line with our hypothesis, we expect all these seven features to have predictive power and to point to an obj interpretation of the DCs.

Feature *of\_outside\_DC* encodes the first property in Table 1. Here we counted the percentage of occurrences of a (singular) head noun in which it also realizes an *of* -phrase. For feature *by\_outside\_DC* (i.e., the third property in Table 1), we collected the frequency of a *by*-phrase with a head noun. Feature *sum\_adjectives* collects all the (singular form) occurrences of the head nouns in a modifier relation with agent-oriented or aspectual adjectives (cf. second and fifth property in Table 1).<sup>9</sup> Feature *sg\_outside\_DC* measures the percentage of singular occurrences of the head noun out of its total occurrences in the corpus (cf. last property in Table 1).

<sup>8</sup>We initially collected data on *in*- and *for*-adverbials, but only a few nouns had such occurrences. At closer inspection even these examples turned out not to illustrate *in*- and *for*-phrases that modify the telic/atelic aspect of the head noun, as Grimshaw and Borer used them. Instead, they mostly functioned as temporal modifiers, and we therefore discarded this feature.

<sup>9</sup>Note that, given Grimshaw's assumption that ASNs do not appear in the plural, we counted all of these occurrences in the singular form of the head noun.

### Gianina Iordăchioaia, Lonneke van der Plas & Glorianna Jagfeld

### Feature label Description and illustration 1. of*\_outside\_DC* (Grimshaw 1990) Percentage of the head's occurrences as singular outside compounds which realize a syntactic relation with an *of* -phrase. E.g., *assignment of problems* 2. by*\_outside\_DC* (Grimshaw 1990) Percentage of the head's occurrences in the singular outside compounds which realize a syntactic relation with a *by*-phrase. E.g., *assignment (of problems) by teachers* 3. *sum\_adjectives* (Grimshaw 1990) Percentage of the head's occurrences in a modifier relation with one of the adjectives *frequent*, *constant*, *intentional*, *deliberate*, or *careful*. 4. *sg\_outside\_DC* (Grimshaw 1990) Percentage of the head's occurrences as singular outside compounds. 5. by*\_inside\_DC* (≈ 2. by*\_outside\_DC*) Percentage of the head's occurrences as singular inside compounds which realize a syntactic relation with a *by*-phrase. E.g., *task assignment by teachers* 6. *sg\_inside\_DC* (≈ 4. *sg\_outside\_DC*) Percentage of the head's occurrences as singular inside compounds. 7. *process-vs-result* (≈ ASN vs. RN) Native speaker annotation of each DC as a process (*car driving*) or result (*apartment building*) on a scale from 5 to 1. 8. *suffix* NEW Suffix of the head noun: *-al* (rent**al**), *-ance* (insur**ance**), *-ing* (kill**ing**), *-ion* (destruct**ion**), *-ment* (treat**ment**) 9. *head\_in\_DC* NEW Percentage of the head's occurrences within a compound out of its total occurrences in the corpus.

### Table 3: Indicative features for head nouns

### 3 Compositionality in English deverbal compounds: The role of the head

Grimshaw's properties in Table 1 characterize deverbal nouns as ASNs when they appear on their own, i.e., *outside* compounds. This is why features 1. to 4. are labeled correspondingly. Yet, if DCs are supposed to resemble ASNs, we considered that their head nouns should preserve these properties also within DCs, i.e., when the head noun is *inside* a DC.<sup>10</sup> For this reason, we also introduced the features *sg\_inside\_DC* and *by\_inside\_DC*. The former measures the percentage of singular DCs out of their total occurrences, and the latter the percentage of DCs that realize a *by*-phrase. We did not test *of* -phrases inside DCs, since DCs usually realize the object as a non-head (see our annotation results in Section 3.3.1) and collecting such occurrences would have mostly delivered noise. The adjectives modifying DCs were also left out, because their number was close to inexistent.

There are two caveats to these features inspired by Grimshaw (1990). First, as we noted in Section 2.1.1, the individual ASN-properties are not fully reliable in determining ASN-hood: e.g., there is ambiguity in argument marking (i.e., *of* and *by*-phrases), and deverbal nouns are easily coerced between the readings. For this reason, Grimshaw used several such properties together in her examples. However, we extracted these data from corpora, and most of the attestations were too few to allow any combined patterns beyond the one we ensured – that of a singular form of the head noun in each of the other properties. Second, and related to this, basing our study on a corpus comes with the risk that, no matter how large the corpus, it may not present enough relevant data. It was for these two reasons that we considered adding three more head-related features to our study. We first gathered native-speaker intuitions about the ASN vs. RN status of the head nouns in DCs (see feature *process-vs-result*) and supplemented Grimshaw's tests with information about the suffix and the frequency of the head noun within compounds (features *suffix* and *head\_in\_DC*).

We designed feature *process-vs-result* (P-R) in order to grasp Grimshaw's intuition about the contrast between ASNs and RNs by means of introspection. The process vs. result interpretation is the fundamental difference between ASNs and RNs in Grimshaw's understanding. It can be seen as the latent variable that her morphosyntactic properties are intended to identify: ASNs express processes or events like the corresponding verbs, while RNs depart from this meaning and express results. Following this annotation (see Section 3.3.2), we gathered information on how salient the verbal process is in the meaning of a DC and, indirectly, how accessible the compositional structure of the base VP is within the DC.<sup>11</sup>

<sup>10</sup>Di Sciullo (1992) and Borer (2013) apply the same reasoning.

<sup>11</sup>The way we gathered estimates for our P-R feature comes close to the NLP studies which gather native speaker evaluations about the transparency of compounds. Namely, our three annotators had to evaluate how close the morphosyntactic (and semantic) relationship between the head noun and the non-head comes to the fully compositional relationship between the corresponding verb and its argument or adjunct.

### Gianina Iordăchioaia, Lonneke van der Plas & Glorianna Jagfeld

The last two features *suffix* and *head\_in\_DC* represent two further properties of the head nouns that we considered interesting for our study. The theoretical literature does not offer much on suffixes. *-Ing* has received most attention, to the extent that Grimshaw argued that it always forms ASNs, while Borer claims that it encodes what she calls an originator (i.e., subject argument), with the effect that in compounds, the subj reading is blocked for non-heads and obj is favored. Neither contention is true. First, *-ing* presents several examples of RNs (see *building(s)*, *writing(s)*, *reading(s)*). Second, we *do* find subj-DCs headed by *ing*-nouns (see 1b). In general, the information on the suffix is independent of ASN-hood, since all suffixes allow both ASN and RN readings, but we aimed to check whether some suffixes may be more informative than others.

Feature *head\_in\_DC* delivers us the degree of compoundhood of a deverbal noun, i.e., how likely it is to appear within a compound. The expectation is that a noun that typically appears in compounds has undergone some meaning specialization, which requires another noun to be instantiated. One may rightly say that this makes the meaning of such head nouns less transparent than for those that freely appear both within and outside compounds. However, for deverbal nouns, to the extent that this slight meaning specialization requires a particular type of non-head, it can give us useful information about which (morpho)syntactic relationship between the base verb and one of its arguments is most likely to form a DC. If it is a non-obj relation, this shows that compositionality as in (6) is not a typical condition in the formation of DCs, weakening the relevance of our investigation. However, our results in Table 7 below indicate that high compoundhood correlates with an obj interpretation of the non-head, which supports the relevance of compositionality in the formation of DCs.

## **4.2 Technical support**

To obtain statistics for the morphosyntactic features, we extracted counts for the selected DCs and their head nouns from the Gigaword corpus by matching patterns defined over word forms, lemmas, POS tags and dependency relations, as provided by the automatic corpus annotations. The specific patterns used for each feature are detailed in the following.

For the *inside\_DC* features we extracted DCs from the Gigaword corpus by locating two adjacent nouns according to the POS tags NN for singular nouns and NNS for plural nouns, and excluding noun pairs directly preceded or succeeded by other nouns or proper nouns (POS tags NNP and NNPS). DCs were matched with the word form of the non-head and the lemma of the head, thereby extracting singular and plural occurrences. We determined the grammatical number of

### 3 Compositionality in English deverbal compounds: The role of the head

a noun or compound by its POS tag or the POS tag of its head, respectively. For example, we matched *security training(s)*, but not *airport security training* and *security training instructor*, to make sure that we do not extract parts of larger compounds. Conversely, the *outside\_DC* features apply to head nouns (matched by their lemma and POS tag NN or NNS) without any noun or proper noun next to them.

Figure 1: Illustration of morphosyntactic patterns to extract DCs heading *of-phrases* (top) and *by-phrases* (bottom)

We counted a DC (or its head noun) as being in a syntactic relation with an *of* -phrase or *by*-phrase, if it (or its head) governed a collapsed dependency labeled "prep\_of"/"prep\_by"12, as in Figure 1. Since we were interested in prepositional phrases that realize internal or external arguments, but not in temporal phrases (e.g., *by Monday*) or fixed expressions (e.g., *of age*, *by chance*), we excluded phrases headed by words that typically appear in these undesired constructions. We semi-automatically compiled these lists based on a multiword expression lexicon<sup>13</sup> and manually added entries. To compute the feature *sum\_adjectives* we counted how often each noun outside a DC governs a dependency relation labeled "amod", where the dependent is an adjective (POS tag JJ) out of the lemmas *intentional, deliberate, careful, constant*, and *frequent*.

<sup>12</sup>By conflating dependencies involving prepositions or conjuncts, collapsed dependencies directly link content words. This simplifies the extraction patterns, as we can obtain the complement of the prepositional phrase depending on the noun or the DC, by following a single dependency arc.

<sup>13</sup>http://www.cs.cmu.edu/~ark/LexSem/

Gianina Iordăchioaia, Lonneke van der Plas & Glorianna Jagfeld

## **4.3 Reliability of the extracted features**

Our extracted features rely on the automatic corpus annotations, the manually defined extraction patterns, and, in the case of the *of* -phrases and *by*-phrases, on heuristics, to exclude undesired matches of temporal phrases or fixed expressions. The constituency parser, which was used to obtain the syntactic analyses then converted to dependency trees, obtained an average F1-score of 91.4% on a standard test set, Section 22 of the Wall Street Journal corpus (Huang et al. 2010).

To measure the reliability of the extracted features, more in particular the most error-prone features based on heuristics, we exemplarily conducted a manual analysis of the counts of head nouns that appear in conjunction with *of* -phrases and *by*-phrases. For this, we implemented the following pattern to extract all candidate sentences in the corpus for this feature. We selected all sentences in which one of the target head nouns outside a compound was followed by a token with lemma *of* or *by* and POS tag IN, not separated by a punctuation mark.<sup>14</sup> On the one hand, this was driven by the motivation to keep the number of sentences on a manageable level and focused on the feature of interest. On the other hand, we designed the pattern to maximize recall so as not to miss out on any true positives. We then randomly selected 2000 of these sentences for each preposition for a manual annotation of the target features by a single human annotator. A comparison of the annotated instances with the automatically extracted instances revealed a precision of 91.0% and recall of 90.1% for *of* -phrases, while the results for *by*-phrases were lower (85.0% precision, 73.8% recall).

## **5 Data exploration with machine learning techniques**

Our goal is to test the features listed in Table 3 for their predictive power in determining the relation between the head and the non-head. These features are composed of numerical (1 to 7, and 9) and categorical features (8). The dependent variable is a binary feature that varies between one of the two annotation labels, obj and nobj. We trained a logistic regression classifier to model the effect of these features.<sup>15</sup>

We divided the data described in §3.3.1 into a test and a training set. Because the features are all head-specific, as can be seen in Table 3, the model was tested on a test set for which we ensured that neither compounds, nor heads were seen in the training data. Therefore, we randomly selected two mid-frequency heads

<sup>14</sup>We used the following list of punctuation characters: ".", "?", "!", ";", ":", ",".

<sup>15</sup>We used version 3.8 for Linux of the Weka toolkit (Hall et al. 2009) and experimented with several other classifiers that have interpretable models (decision trees), but also support vector machines and naive Bayes classifiers. All of these underperformed on our test set.

### 3 Compositionality in English deverbal compounds: The role of the head

for each suffix and removed these from the training data to be put in the test data. We expect mid-frequency heads to lead to most reliable results, because high-frequency heads may show higher levels of idiosyncrasy and low-frequency heads may suffer from data sparseness.<sup>16</sup> This resulted in a division of roughly 90% training and 10% testing data.<sup>17</sup> The data set resulting from the annotation effort is skewed with obj being the majority class. Our selection of test instances introduces further differences in proportions of obj and nobj in the test and training set. Therefore, we balanced both the training and test set by randomly removing instances with the obj relation (the largest class) until both classes were equal in size.<sup>18</sup> The balanced training set consisted of 1248 examples, and the test set of 132 examples.

We compared our models with the random baseline, and two additional baselines to make sure that the features we are proposing are not just a by-product of the impact of simpler variables. We computed the relative<sup>19</sup> frequency of the head and the relative family size, i.e., how many compound types we find with a given head.<sup>20</sup>

We ran ablation experiments to determine the individual contribution of each feature in addition to the other features. However, because features might be interdependent and one feature could overshadow another, we first looked at the performance of each feature individually. This way, we could measure the exact predictive power of each individual feature in comparison to the baselines. Lastly, we combined the top- features from ablation experiments and individual feature experiments to see the overall predictive potential of the model.

The first row in Table 4 shows that, when using all features, the classifier significantly outperforms<sup>21</sup> the baselines with a large margin (78.8%). This proves that the combination of features driven by linguistic theory has strong predictive power.

<sup>16</sup>We remind the reader that our goal is not to determine the realistic performance of our model, but to measure the contribution of the features. Therefore we believe that the bias introduced by selecting mid-frequency items for the test set is acceptable.

<sup>17</sup>Multiple divisions of training and test data would lead to more reliable results, but we have to leave this for future work.

<sup>18</sup>We also ran experiments with non-balanced data, because we reasoned that more data might result in higher performance, but the performance proved to be comparable. A balanced dataset facilitates comparisons to the random baseline of 50%.

<sup>19</sup>By providing relative counts, we make sure these features are on the same scale as our other features.

<sup>20</sup>These additional baselines were computed on a slightly different test and training set, due to the random process in balancing the data.

<sup>21</sup>Significance numbers for these experiments, in which training and test data are fixed, were computed with a McNemar test with < 0.05, as it makes relatively few type I errors (Dietterich 1998).

### Gianina Iordăchioaia, Lonneke van der Plas & Glorianna Jagfeld

Table 4: Percent accuracy for individual features. "†" indicates a statistically significant difference from the performance of all features. All results are statistically significant in comparison to the baselines.


With respect to the upper bound, we cannot directly compare the numbers in Table 4 with the IAA reported in Section 3.3.1, because the data we use for testing and training includes only examples on which all annotators agree; neither can we use the 100% IAA on this selected test set as an upper bound. We expect the IAA for this high-agreement test set to lie between 100% and the 81.5% reported in §3.3.1 for the complete dataset and two annotators. The 78.8% we attain is not too far from the upper bound we can estimate from these IAA values.<sup>22</sup>

Furthermore, the results for the individual features in Table 4 show that each feature outperforms the baselines significantly. This means that each feature contributes significantly to the prediction of the relation. The 78.0% performance of the model that combines the top-2 features is comparable to the 78.8% of the model that includes all features. This means that although all features contribute to the quality of the prediction of the model individually, the best features overshadow the effect of the less well-performing features.

<sup>22</sup>A realistic upper bound for the test set could be determined by getting an independent annotator to annotate the items in the test set and measuring the agreement with the previous annotations. We leave this for future work.

### 3 Compositionality in English deverbal compounds: The role of the head

Table 5 shows the results from the ablation experiments. Only the removal of features*suffix, of\_outside\_DC*, and *P-R* result in a significant drop in performance, which means that their contribution in addition to the other features is particularly important. Their performance together is not significantly higher than that of all features (cf. 80.3% vs. 78.8%).



For the sake of comparison, Table 6 shows the results of a model using corpusbased features only, i.e., the data does not include the *P-R* feature that is based on human judgments. Like in Table 5, we see that the features *of\_outside\_DC* and *suffix* are particularly important also in this model, since their absence triggers a significant drop in performance. In this model, however, the contribution of the feature *by\_outside\_DC* also becomes significant, in contrast to the model in Table 5, which included the *P-R* feature.

Table 7 shows the direction of the prediction of the features in all three models (Tables 4 to 6). In other words, it shows whether higher values of a given feature are indicating higher chances of an obj or nobj relation. We gathered these directions by inspecting the coefficients of the logistic regression model.<sup>23</sup>

<sup>23</sup>We inspected the weights in the models as well, but they are not very informative, because there is a high level of collinearity in the features and the weights are calculated based on all other features staying equal. For this reason we report results on single feature models and ablation tests instead.

### Gianina Iordăchioaia, Lonneke van der Plas & Glorianna Jagfeld


Table 6: Ablation experiment with corpus-based morphosynctic features (no P-R). "†" indicates a statistically significant difference from the performance of all features.

Table 7: Direction of prediction per feature in different models. Consistent values across studies in bold


3 Compositionality in English deverbal compounds: The role of the head

## **6 Discussion**

In what follows we offer a detailed discussion of our results and interpret them in view of our initial hypothesis (Section 6.1). We then show their implications for compositionality and for our starting hypothesis (Section 6.2). In the end we present the main comparison points with respect to previous NLP literature (Section 6.3).

## **6.1 Interpretation of results**

### **6.1.1** *Process-vs-result* **(***P-R***)**

According to Table 4, the best individual feature is the *process vs. result* reading of the DC with 76.5% accuracy. The accuracy resulting from the combined model with all features (78.8%) is not significantly higher (McNemar two-tailed -value of 0.2482), showing that this single feature is indeed very strong, and stronger than any of the morphosyntactic features on their own or in combination (cf. Table 6). This is not surprising, given that this feature encodes direct estimates for the ASN-hood of the head based on introspection.<sup>24</sup> In the ablation experiment in Table 5, *P-R* also proves to be very strong, since its removal yields a significantly lower result (72.0% vs. 78.8%), the lowest in this experiment. Still, the ablation study shows that removing *of-outside* is as detrimental to the model as removing *P-R*. This indicates that these two features capture characteristics that complement the rest of the morphosyntactic features to a similar extent.

Importantly, in line with our hypothesis, an increase in the *P-R* value correlates with an obj interpretation of the compounds in both experiments (see Table 7). To be precise, the *P-R* feature is so designed that a high value indicates that the DC is headed by an ASN, which parallels the verbal construction in (6). Given that such a compositional structure requires the object to be realized first, the fact that a high *P-R* value correlates with an obj reading of the DC in our models confirms our hypothesis that compositional DCs involve object non-heads.

The two columns in Table 8 illustrate pairs of DCs which, despite having the same head, reveal contrasting *P-R* values. In these examples, one can see that whenever the DC pair differs between an obj and a nobj reading, the obj reading receives the higher *P-R* value. This is predicted by our hypothesis and also supported by the results in Table 7. However, we also find examples with two

<sup>24</sup>It is interesting to see though that manual annotation was better at predicting ASN-hood than any of the features, in spite of the huge corpus we used. This suggests that we need even larger corpora to make up for the performance of (expensive) manual annotation.

### Gianina Iordăchioaia, Lonneke van der Plas & Glorianna Jagfeld

considerably different *P-R* values under the same obj (or nobj) interpretation, which shows that there is no one-to-one correspondence between a (high) process reading and an obj interpretation of the DC.<sup>25</sup>


Table 8: DC pairs with contrasting *P-R* values

The confusion matrix for the feature *P-R* in Table 9 confirms that the machine learning algorithm was not able to find a clear cut-off value for this feature above which we find only obj readings. The *P-R* feature misclassifies 18 obj-DCs as nobj, and 13 nobj-DCs as obj. Examples of the former case are the obj-DCs in the second column of Table 8, which have a low *P-R* value, because they involve RN heads (see *temperature reading*, *alcohol tolerance*). In the latter case, the errors concern the nobj-DCs from the first column of Table 8, which have a high *P-R* value (see *career counseling* and *nicotine withdrawal*).

Table 9: Confusion matrix for *P-R*


<sup>25</sup>nobj-DCs with a high *P-R* value are usually headed by simple event nominals like the nouns in (11c, d).

### 3 Compositionality in English deverbal compounds: The role of the head

In our study, the *P-R* annotation feature comes closest to the transparency rating of compounds carried out in some NLP studies (cf. Section 2.2). The difference is that we correlated the rating with the semantics of the base verb in combination with its argument or adjunct, following Grimshaw's (1990) insight. At the same time, our design primarily targeted compositionality.

### **6.1.2** *of\_outside\_DC*

The next most important feature in our endeavor to capture compositionality in DCs is the realization of an *of* -phrase by the deverbal noun. This feature is intended to measure how often the deverbal noun realizes an *of* -phrase introducing the object argument, when appearing outside DCs. If the head noun of a DC shows a high tendency to realize *of* -phrases introducing objects, we expect it to also require object non-heads in DCs.

Although on its own the feature *of\_outside\_DC* yields a value of only 59.8% (see Table 4, insignificantly lower than the next higher value of 61.4%), the ablation study in Table 5 shows that its removal is just as detrimental for the system as the removal of the *P-R* feature: The accuracy drops from 78.8% to 72.0%. Similarly, in the model with corpus-based morphosyntactic features in Table 6, its removal triggers the largest drop, showing that in combination with the other features, the contribution of *of\_outside\_DC* is very important. This confirms Grimshaw's claim that the realization of the object argument is essential in identifying ASNs. Even more important for our hypothesis is the fact that *of\_outside\_DC* systematically correlates with an obj-DC in all our models (see Table 7). That is, to the extent that this feature identifies DCs with ASN heads, a high value indicates an object reading for the DC, as expected under our hypothesis.

The question is why the *of\_outside\_DC* feature does not score better than 59.8% on its own. First, as shown in Section 2.1.1, the presence of an *of* -phrase per se, as extracted from the corpus, is no guarantee for ASN-hood, since *of* -phrases may introduce possessive modifiers of RNs, besides the object arguments of ASNs. Second, even in their ASN reading, deverbal nouns attested in corpora do not always realize their object arguments (cf. Grimm & McNally 2013).

The samples in Table 10 show various mismatches between the realization of *of* -phrases and the formation of obj-DCs. For instance, *avoidance* and *preservation*, which build only obj-DCs in our database, have fewer occurrences with an *of* -phrase than *creation*, which forms only 72.7% obj-DCs. Moreover, *proposal*, which forms a high proportion of obj-DCs, realizes *of* -phrases in only 1.0% of its occurrences. In spite of the many obj-DCs like *book/contract/marriage/ investment proposal*, the verbal relation is lost in this noun. It mostly functions

### Gianina Iordăchioaia, Lonneke van der Plas & Glorianna Jagfeld


Table 10: Head nouns with (in)frequent *of* -phrases. Outliers in bold.

as an RN, i.e., it refers to the proposal made, and not to the process/event of proposing. In confirmation of this, these DCs received a *P-R* rating as low as 20% to 26.7%. This is an example of how our individual features complement each other.

The confusion matrix for the feature *of\_outside\_DC* in Table 11 shows indeed that the model based on this feature makes many false predictions, notably, it attributes 38 objreadings to DCs that in fact have a nobjreading. This means that the prediction power of *of\_outside\_DC* is misled by the presence of *of* -phrases with head nouns that form nobj-DCs (see Table 10). These DCs involve RN heads, which realize *of* -phrases as modifiers and not object arguments. The head noun *assassination* in Table 10 is one example. That this noun behaves like an RN is confirmed by the *P-R* value of the DCs it forms, which is below the average of 60%. A similar problem is posed by the DCs headed by, e.g., *creation*, which also allows RN readings and forms nobj-DCs, in spite of the high frequency with *of* -phrases (Table 10). In these critical cases, the results in Tables 5 and 6 show that the other morphosyntactic features compensate for the errors made by the *of\_outside\_DC* feature, helping the model.

Table 11: Confusion matrix for *of\_outside\_DC*


### 3 Compositionality in English deverbal compounds: The role of the head

All in all, when comparing *of\_outside\_DC* with *P-R* in the ablation study, their contribution in combination with the other corpus features is similar. The difference is that the other features negatively affect the 76.5% individual contribution of *P-R* (cf. 72%), while they substantially improve the 59.8% contribution of *of\_outside\_DC* (cf. Table 4). Thus, the contribution of *of\_outside\_DC* greatly relies on the other ASN-features in the ablation models in Tables 5 and 6. This is not surprising, given the ambiguity of *of* -phrases, a reason for which Grimshaw (1990) used this test in combination with others (see Section 2.1.1). The contrast between *P-R* and *of\_outside\_DC* is also expected, since *P-R* is manually annotated and targets the underlying ASN-hood of the deverbal noun; the corpus features can only capture some aspects of it.

### **6.1.3** *Suffix*

*Suffix* is an important feature in all our models (see Tables 4, 5, and 6). It is the strongest morphosyntactic feature, as we can see from the performance of the individual features in Table 4, and has additional predictive power compared to the combination of all features (see Tables 5 and 6). However, Table 7 demonstrates a high variance in the direction of prediction of each suffix. Except for *-ment*, which correlates with obj readings, none of them is constant across models.

As noted in Section 4.1, the theoretical literature does not offer much on the role of suffixes in the ASN vs. RN disambiguation of deverbal nouns. Grimshaw (1990) and Borer (2013) suggest that *-ing* should form ASNs, which is disconfirmed by some data and by our models, where *-ing* oscillates between obj- and nobj-DCs. It is difficult to draw any conclusions on the role of the *suffix* feature for our compositionality hypothesis for two reasons. First, more theoretical research must be pursued to draw some definite conclusions on possible correlations between suffixes and ASN-hood, since the one suffix that was expected to show a preference did not. Second, we must also consider that the dataset of DCs for each suffix was five times smaller than for the other features in our study: i.e., the feature *suffix* subsumes five different suffix features. The small dataset may also be a reason for the inconclusiveness of the results in Table 7. 26

The high variation between obj and nobj readings in Table 7 indicates that the valuable contribution of the *suffix* feature in the prediction task (72.0% in Table 4) comes from the complementarity between the individual suffixes. Similarly, in the ablation models in Tables 5 and 6, the contribution of the suffixes – which,

<sup>26</sup>To check correlations between individual suffixes and ASN-hood, one could measure how the *suffix* feature fares with respect to the *P-R* value and not the obj-nobj readings of DCs. This, however, would digress from the focus of this paper and we leave it for future research.

### Gianina Iordăchioaia, Lonneke van der Plas & Glorianna Jagfeld

recall, is independent of Grimshaw's tests – is complementary to the features that diagnose ASN-hood. Thus, the *suffix* feature is not informative about the relation between compositionality and interpretation in DCs, but improves the predictive power of the models.

### **6.1.4** *sg\_outside\_DC* **and** *sg\_inside\_DC*

The frequency of the noun head in a singular form whether outside or inside a DC yields similar accuracy levels (68.9% in Table 4, 78.8% and 80.3% without a significant difference in Table 5, and 72% in Table 6). This similarity supports our assumption that within DCs the head nouns should preserve the properties from outside DCs (see Section 4.1). However, an interesting difference appears with respect to the direction of prediction, since only *sg\_outside\_DC* constantly predicts obj-DCs across all the models in Table 7, while *sg\_outside\_DC* is less reliable. This suggests that Grimshaw's morphosyntactic ASN-properties may be more reliable when the deverbal noun appears outside a DC than inside DCs.<sup>27</sup>

### **6.1.5** *head\_in\_DC* **(compoundhood)**

As an individual feature, the accuracy of *head\_in\_DC* is just above average among the other features in the present study (see Table 4). Its removal in our ablation experiments yields slight and non-significant drops in accuracy. In Section 4.1, we conjectured that an obj reading of DCs whose head nouns present high compoundhood would show us that a compositional construction with an object nonhead is very likely to form DCs. The direction of prediction in Table 7 indicates that high values of this feature consistently correlate with obj-DCs, supporting this assumption. However, why does this feature not perform better? Our full database shows that its values are not informative enough: there are a few head nouns which display high compoundhood and frequently form obj-DCs, but the majority of DCs have very low such values. Only 5.1% of our DCs have a *head\_ in\_DC* value above 50% and as many as 70.3% of them have one under 20%.

Table 12 illustrates the few head nouns that most often appear in DCs and the frequency of an obj reading among the DCs they appear as heads of. As visible there, a high frequency of a deverbal noun in DCs correlates with a high value for an obj reading of the compound's non-head, as predicted (cf. Section 4.1).

<sup>27</sup>The *inside* features do not damage our model, since removing *sg\_inside\_DC* and *by\_inside\_DC* from the ablation model yielded 77.3% accuracy – lower than 78.8% for all features together, though not significantly so.

### 3 Compositionality in English deverbal compounds: The role of the head


Table 12: Head nouns with high compoundhood

### **6.1.6** *sum-adjectives* **and** *by***-phrases**

The last three features we employed in our study are *sum-adjectives*, *by\_outside\_ DC* and *by\_inside\_DC*. On their own, they have some predictive power (Table 4), but their removal in Table 5 has no significant impact on the results, showing that *P-R* compensates for their absence. Interestingly, in the corpus-based morphosyntactic model in Table 6, the removal of *by\_outside\_DC* triggers a significant drop, indicating that in the absence of *P-R*, this feature becomes important. Yet, in spite of our expectation for this feature to identify obj-DCs, its direction of prediction is nobj in all models (see Table 7). As we saw in Section 2.1.1, *by*phrases are ambiguous and their presence indicates ASN-hood only when the object argument is also realized (see 10). We considered using the frequency of *by*-phrases co-occurring with *of* -phrases, but the numbers were extremely low. Thus, the unexpected direction of prediction of *by*-phrases might be due to their ambiguity. The other two features do not preserve the direction of prediction (Table 7).

The inconclusiveness of these three features most likely resides in data sparsity. Namely, for the feature *by\_outside\_DC* the range of frequency in our full database is 0–6.22% with 60% of the deverbal head nouns realizing a *by*-phrase in fewer than 1% of their occurrences outside DCs. For *by\_inside\_DC* the range is between 0% and 4.36%, with 74% of the DCs displaying a *by*-phrase in fewer than 1% of the cases. For *sum-adjectives* the value is even lower: the frequency ranges between 0% and 1.8%, with 99% of the cases having a value under 1%.

### **6.1.7 Summary**

In summary, *P-R*, the feature based on introspection, is the strongest. It provides a high performance individually and its removal from the model considerably

### Gianina Iordăchioaia, Lonneke van der Plas & Glorianna Jagfeld

hurts the results. *Suffix* is the strongest morphosyntactic feature. It brings additional value over the combination of all features including *P-R*, but it does not reach the performance of *P-R* on its own. *Of-outside* is the next valuable feature. On its own, it is not very strong, but it is a very important addition to the other features. Its removal from the combined models hurts the performance considerably. The feature *by-outside* is valuable when only corpus-based morphosyntactic features are considered. If *P-R* is present in the model, *by-outside* is unimportant. This indicates that this feature has a considerable overlap with *P-R*. The other features all have predictive power, but their additional predictive power is not very important. They capture the same signal in a less reliable way.

The latent variable that we are trying to capture with the features presented in this study, the ASN-hood of the head, is best represented by the introspectionbased feature *P-R*. The morphosyntactic features *suffix* and *of-outside* have additional value in the combined model, which includes P-R, as the ablation studies show. They seem to help the strong feature P-R to move the model in the right direction. However, although the combination of P-R and the best morphosyntactic features leads to an improvement (80.3% vs. 78.8%), we could not prove that their addition to *P-R* as a single feature model improves the results significantly.

## **6.2 Implications for our hypothesis**

We have identified four features which are important for the interpretation of DCs: *P-R*, *of\_outside\_DCs*, *by\_outside\_DC*, and *suffix*. The first three were inspired by Grimshaw (1990) and later research in the same vein, the fourth was introduced by us. As mentioned in Section 6.1.3, the suffix does not tell us anything about ASN-hood or the compositionality of the DC. It is a morphological feature, which scores well on its own and better than most ASN-features from Grimshaw (1990); yet, in ablation studies, it is weaker than *of\_outside\_DCs*, which is Grimshaw's most important ASN-feature.

The other three features all give us input on ASN-hood, but in different ways. An unexpected result comes from *by\_outside\_DC*, whose direction of prediction is for nobj-DCs, instead of obj-DCs. In Section 6.1.6, we reasoned that this is due to the ambiguity of *by*-phrases, which we could not eliminate by measuring their co-occurrence with *of* -phrases, given data sparsity. The only way we can interpret this result is that, in combination with other ASN-features which usually point to obj-DCs, the input from the ambiguous *by*-phrases was used by the model for the other direction, of nobj-DCs.

The features *P-R* and *of\_outside\_DCs* are the most important for the ASN-hood of head nouns and the implicit compositional interpretation of DCs. They both

### 3 Compositionality in English deverbal compounds: The role of the head

behave as predicted by our hypothesis. *P-R* represents human intuitions with respect to the ASN-hood of the head noun and scores best in our models. In addition, in line with Grimshaw's claims and our hypothesis, its direction of prediction consistently points to obj-DCs. *Of\_outside\_DCs* is not very strong on its own, but extremely important in combination with the other ASN-features. This is in fact what Grimshaw's combined use of two or three of these morphosyntactic tests (in order to circumvent ambiguity) leads us to expect (see Section 2.1.1).

These results immediately confirm two things:


A further implication of these observations is that, indeed, the (deverbal) head noun plays a crucial role in the compositionality and overall transparency of DCs, a conclusion that was reached by other computational studies as well (see Section 2.2.2).

For the DCs whose heads fail to exhibit ASN-properties and behave like RNs, our features cannot get very far. These DCs behave like RCs, and the relation between their two parts may even be unrelated to the base verb and its modifiers. For these DCs, the addition of other features, especially some designed for non-heads, should improve the results. In this case, it would be worth including features from previous NLP work, which deals with noun-noun compounds in general, especially that reported in Section 2.2.2. We leave such a study for future research, since it departs from our focus here.

## **6.3 Comparison to other NLP approaches**

We mentioned in Section 2.2.1 that the aims of previous work on predicting the relation between heads and non-heads in DCs are different from ours. Whereas this work focuses on building classifiers that reach state-of-the-art performance on the task of predicting the relation between the head and the non-head of deverbal compounds, our interest lies in uncovering in how far the behavior of the derived nominals (as ASNs or RNs) can help in predicting the (compositional) relation between head and non-head. As a result, the datasets are very different.

### Gianina Iordăchioaia, Lonneke van der Plas & Glorianna Jagfeld

However, we present here some meaningful comparisons with this work. In the two-class prediction task, Lapata (2002) reaches an accuracy of 86.1% compared to a baseline of 61.5%, i.e, 24.6% above the baseline. The accuracy we achieve is 80.3%, i.e., 30.3% above the 50% baseline of our balanced test set. Relative improvements are comparable. Note that the data set of Lapata (2002) included DCs ending in suffixes such as *-er* and *-ee* which are biased in the relation they select. Including them in our dataset could have resulted in better accuracy overall and a stronger predictive power for the *suffix* feature.

Apart from the differences in the data set, we also see large differences in the type of features selected. In this paper we exclusively tested the predictive power of morphosyntactic features of the deverbal noun for determining the covert relation. In the future, it would be interesting to compare these to the encyclopedic/pragmatic features prevalent in the CL literature, by incorporating the latter into our models.

Schulte im Walde, Hätty & Bott (2016) evaluate the influence of several properties of the constituents (frequency, productivity and ambiguity) on the performance of the model in its predictions on transparency. Just as they attribute the influence of these properties to the underlying property of ambiguity, so do we attribute the non-compositionality in the relation between head and compounds (in RNs) to the greater underspecification of RNs in comparison to ASNs. Although we do not have access to transparency ratings for our DCs, we have gathered annotations on their process vs. result interpretation (see Section 3.3.2). This information can be seen as a proxy for the transparency of the head, because by default the more result-like the DC is, the less transparent it will be.

Furthermore, Schulte im Walde, Hätty & Bott (2016) emphasize the importance of properties of the head and the compound, and to a lesser extent of the modifier (i.e., non-head) for the prediction of the transparency of the compound. The authors stress the need to carefully balance datasets according to the empirical and semantic properties of the compounds, as well as of their heads. We have balanced our data set for corpus frequency of the head and measured the family size of the heads. We have not measured other properties that they have used, but will consider these in future work.

3 Compositionality in English deverbal compounds: The role of the head

## **7 Conclusions**

In this paper we have presented a study on the (syntactic) compositionality of DCs, as predictable from the morphosyntactic properties of their head nouns. We have employed theoretical insights on the behavior of deverbal nominals, on the basis of which we collected corpus data, as well as manual annotations. We used this data collection in the form of indicative features in a logistic regression model, by means of which we evaluated the prediction power of each feature for the obj (vs. nobj) interpretation of the compounds.

Our approach to compositionality comes from the theoretical linguistic perspective according to which the compositionality of a complex expression (here, the DC) depends on the meanings of its parts, as well as the syntactic relationship between them. To the extent that DCs are headed by deverbal nouns, the fully compositional ones encode the syntactic-semantic relationship between the base verb and its object, while the less compositional ones are underspecified/ambiguous. This difference is traced back to the ambiguity of deverbal nouns between ASN and RN uses from Grimshaw (1990). ASNs preserve the compositional requirements of the base verb, while RNs do not.

Our results confirm our hypothesis that DCs with ASN-heads are compositional and receive an obj reading. This study, however, raises a few questions for future research. It especially highlights the need for more study on the role of individual suffixes in the interpretation of the deverbal noun, since previous claims on *-ing* as primarily building obj-DCs have not been confirmed. In addition, some tests which are popular in the theoretical literature (e.g., *in/for*-adverbials, agentive and aspectual adjectives, as well as *by*-phrases) could not be used or were not reliable enough as features, probably due to data sparsity. On the one hand, their low attestation in corpora throws doubts on their authenticity, requiring further empirical study. On the other hand, this is also an alarm signal for the need of even larger corpora in order to reliably test theoretical insights, which human intuitions are considerably better at, as proven by our *P-R* feature.

By comparison to the previous NLP work on the transparency of (root) compounds, we did not consider both constituents to evaluate the mapping with the compound; we focused on the head noun, which has a crucial influence on the relationship that it establishes with the non-head in DCs. In future work, we will consider including some predictive features of the non-head. We expect that the encyclopedic features exploited in the NLP literature such as in Nicholson & Baldwin (2006), Lapata (2002), and Grover et al. (2005) will benefit the disambiguation of RNs and the DCs headed by these.

## **Abbreviations**


## **Acknowledgements**

We are grateful to Katherine Fraser, Bethany Lochbihler and Whitney Frazier Peterson for annotating our database, to Kerstin Eckart and the INF project in the SFB 732 for important technical support, and to Alla Abrosimova for help with further technical details. This research has been funded by the German Research Foundation (DFG) via grants offered to the projects B1 *The form and interpretation of derived nominals* and D11 *A crosslingual approach to the analysis of compound nouns*, as part of the SFB 732 *Incremental specification in context*, as well as the project IO 91/1-1, all hosted at the University of Stuttgart.

## **References**


3 Compositionality in English deverbal compounds: The role of the head

Chomsky, Noam. 1995. *The minimalist program*. Cambridge, MA: MIT Press.


### Gianina Iordăchioaia, Lonneke van der Plas & Glorianna Jagfeld


Gianina Iordăchioaia, Lonneke van der Plas & Glorianna Jagfeld


## **Chapter 4**

## **What can we learn from novel compounds?**

## Gary Libben

Brock University

This chapter focuses on the question of how novel compounds are processed. To address this question, morphologically unambiguous compounds such as *shotden* are contrasted with morphologically ambiguous compounds such as *clampeel* (which can have the constituent structure *clam-peel* or *clamp-eel*). I discuss how these strings can be seen as *lexical superstates* and present a proposal for how they are parsed. An experiment using progressive demasking and typing is reported. Typing results show evidence of activation of both versions of ambiguous compounds, supporting the view that all lexical substrings in a multiword expression that can be activated, will be activated. I claim that this type of activation is fundamental to the understanding of morphological effects in both the visual recognition and production of English words. Specifically, it enables the creation of *morphological superstates*, the flexible morphological structures that Libben (2017) claims characterize cognitive processing in lexical comprehension and production.

## **1 Background**

## **1.1 An illustrative example**

It might be worthwhile to begin with a non-laboratory example of the type of lexical processing that needs to be accounted for. The example begins with a furniture store in Vienna, named *FantasTeak*. To be sure, the highlighting of *Teak* in the name *FantasTeak* is extremely important, considering that it is a furniture store. But does the medial *T* in *FantasTeak* need to be capitalized? Removing it reveals that indeed it does! Without the medial capitalized *T*, *Fantasteak* seems to generate the activation of the subword *steak*. Indeed, *Fantasteak* is the name of a steak restaurant that opened in Campbelltown, Australia on Mother's Day 2019.

### Gary Libben

*Fantasteak* is a highly complex and ambiguous multiword expression. It has multiple interpretations that trade on the activation of subwords such as *steak*, *teak* and the drink *Fanta*, as well as orthographic and phonological similarity to the whole word *fantastic*. This suggests great voracity in the activation of subwords. Yet, not all subwords of the string appear to be activated. The string *Fantasteak* also contains the substrings *fan, ant*, *taste*, and *tea*. Although these are, on average, higher in frequency than either *steak* or *teak*, they seem to be relatively inaccessible within the string. The goal of this chapter is to explain why. Why is it that some substrings of ambiguous and unambiguous compounds are activated, why other substrings are not activated, and what can this tell us about the fundamental nature of cognitive operations involved in lexical processing?

Understanding how novel morphologically complex words are parsed is key to understanding how people expand their vocabularies and indeed how morphological productivity enables new words to enter the language. There seems to be good reason to believe that this morphological parsing and the activation of lexical substrings that it entails is not as simple and rigid as was previously thought. As Libben (2015) notes, this progression can be seen by tracing developments in the field starting with Taft & Forster (1975). They contrasted stimulus pairs such as *replicate* and *repertoire*, arguing that a word such as *replicate* is perceived by native speakers of English as prefixed, whereas a word such as *repertoire* is not prefixed. They predicted that, as a result, the novel prefixed form *deplicat*e (containing the prefix *de*- and an existing morphological substring *-plicate*) will appear to be more word-like than the novel prefixed form *depertoire* (which does not contain an existing morphological substring). Indeed, Taft & Forster (1975) reported elevated rejection latencies for strings such as *deplicate* in a lexical decision task.

The Taft & Forster (1975) contribution was truly seminal. It was the first to invoke a process of lexical parsing and the activation of substrings to predict patterns of lexical processing across word types. Even more importantly, from my perspective, it linked the morphological structure of a word to the manner in which it is processed by native speakers. This implies that a word such as *replicate* has a prefix-stem structure because (and, perhaps, only because) people "strip off" the prefix during visual word processing.

In my view, considering the morphological structure of a word in terms of what people do when they recognize or produce it, has very substantial advantages. It enables us to link the processing of novel words with the processing of existing words and it requires that we be explicit about the parsing processes that could enable the interpretation of prefixed, suffixed and compound words. Perhaps most importantly, it leads us to the view that words are actions, not things, and that morphology is what people do, not what people have.

4 What can we learn from novel compounds?

## **1.2 Questions of lexical constituent structure**

This leads us directly to the question: So, what is the actual morphological structure of *Fantasteak*? Is it *Fanta-steak*, or *Fantas-steak*, or *Fantas-teak*? My answer to this question would be that the actual structure of *Fantasteak* is any one of those that a language user happens to need. Because *Fantasteak* is a novel creation, it does not "have" any morphological structure when people first see it. Rather, it is their actions that give the word morphological structure. And, as we can see, more than one set of actions are possible. Thus, I suggest that a word such as *Fantasteak* does not have a single fixed morphological structure. Rather, the characteristics of morphological processing in English create a situation in which there is both *steak* and *teak* in *Fantasteak*.

The formation of questions like "Is there *teak* in *Fantasteak*?" above has been at the heart of a line of psycholinguistic inquiry that has sought to isolate the conditions under which lexical substrings are and are not activated during processing. These include the *hat* in *that* (Bowers et al. 2005), the *broth* in *brothel* (Rastle et al. 2004), and the *corn* in *corner* (Longtin et al. 2003; Morris et al. 2008; Lehtonen et al. 2011; Lavric et al. 2012). For the most part, this literature has focused more on the drivers of morphological decomposition and less on the details of morphological parsing. A key question, for example, has been whether morphological decomposition of existing words can be driven by form-based factors (Beyersmann et al. 2016) or whether true morphological decomposition depends on semantic features of the word and of processing (Järvikivi & Pyykkönen 2011; Rueckl & Aicher 2008; Morris et al. 2007).

To be sure, understanding how morphological processing is influenced by formal factors and how it is influenced by the lexical semantic characteristics (e.g., the semantic transparency of the whole word) is extremely important in the development of our understanding of online lexical processing. In this context, novel forms such as *Fantasteak* may have a special role to play. The alternate morphological parses that are available for this novel string may shed light on how putative constituents are identified and the conditions under which substrings such as *fantas-* can be treated by users of the language as word substrings (supported, presumably by the semantic similarity among words such as *fantasia*, *fantastic*, and *fantasy*).

## **1.3 Ambiguous novel compounds and lexical superstates**

Libben et al. (1999) employed a type of novel morphological construction that they claimed is particularly revealing of the dynamics of morphological processing in general and morphological parsing in particular. They focused on ambiguous novel compounds. These are novel compounds such as *clampeel*, which can

### Gary Libben

be parsed as either *clam-peel* or *clamp-eel*. They found that there was no general tendency for native speakers of English to adopt either a first parse (e.g., *clam-peel*) or last parse (e.g., *clamp-eel*) approach. Moreover, they found that ambiguous novel compounds such as *clampeel* show activation of all their potential constituents (e.g., *clam*, *clamp*, *peel*, and *eel*). In other words the answer to the question "Is there a (*clam*, or *clamp* or *peel* or *eel*) in *clampeel*?" would simply be "Yes".

Findings such as these call into question the assumption that a given word will have a univocal morphological structure. Libben (2019) argues that this indeterminacy applies to the morphology of lexical structures in general. Under this view, a string such as *clampeel* can be described as being in a lexical superstate – a cognitive state that is best described by the opportunities for interpretation that it enables. Libben (2019) claims that this applies to morphological structures in general. Thus, an existing compound such as *keyboard* has, as a superstate, the whole word representation *keyboard* as well as the decomposed representation *key-board*. Analogously, an existing suffixed word such as *formality* can be best described as having the lexical superstate representation shown in Figure 1. In this figure, the string has a whole word representation as well as multiple decomposition possibilities. Which one of these is actually implemented in an act of lexical processing will depend on the specifics of the processing task, the individual language user, and the situation in which they are found.

Figure 1: An example lexical superstate representation. The word *formality* can be undecomposed, fully decomposed, have a suffix string (*-ality*) or a complex stem (*formal*).

Lexical superstate representations can also be effective in capturing the structural ambiguity of a word such as *unlockable* in Figure 2, which can be interpreted as 'not lockable' (*un-lockable*) or 'able to be unlocked' (*unlock-able*). Lexical superstate representation can also be applied to novel ambiguous strings such as

4 What can we learn from novel compounds?

*Fantasteak* and *clampeel*. These are shown in Figure 3. As can be seen in this figure, in many ways, the novel string *Fantasteak* is the more complex of the two. In order to capture the key features of *Fantasteak*, it is necessary to indicate that it is linked in an unspecified manner (represented by a dotted line) to the existing word *fantastic* (and *fantasy*, etc.). This acknowledges the likely source of both novel interpretations. It also leads to the requirement to accept a fuzzy parse such as *Fantas-steak*, in which the medial *s* is repeated.

Figure 2: Superstate representations for the structurally ambiguous word *unlockable*.

Figure 3: Superstate representations for novel compounds. *Fantasteak* is shown on the left. The dotted line indicates an association to the existing word *fantastic*. *Clampeel* is shown on the right.

The representation of *clampeel* in Figure 3 has a structure that has features in common with that of *Fantasteak*. It shows the possibility of a fuzzy parse in

### Gary Libben

which the medial letter is repeated (in this case the medial *p* to enable the interpretation *clamp-peel*). Overall, however, *clampeel* is quite a bit more controlled and straightforward than *Fantasteak*. First, we can be relatively confident that *clam* and *clamp* are existing lexical strings of English. This is not necessarily the case for *Fanta* (the name of a drink produced by Coca Cola) or *fantas* (which may or may not be a unit of recognition for speakers of English). Second, the interpretation cannot draw on the interpretation of a set of existing words (as is the case for Fantasteak). Ambiguous novel compounds such as *clampeel*, therefore, may constitute the stimulus type that would enable us to investigate the *Fantasteak* phenomenon under relatively controlled conditions. In addition, ambiguous novel compounds provide a testing ground for the investigation of how readers of English are able to make use of the advantages enabled by compound word productivity in the context of a writing system in which compound words are often written as single unspaced strings.

## **1.4 Fuzzy Forward Lexical Activation generates lexical superstate representations**

Why are English language users likely to find *clam* and *clamp* in *clampeel*? And why are they less likely to find the substrings *lamp*, *am*, and *amp*? Taft & Forster (1976) claimed that, fundamentally, morphological processing was a left-to-right process in the reading of English.

There is a good deal of evidence that supports the assumption that morphological activation is achieved through beginning-to-end processing. However, it is less clear that morphemes themselves have discreet representations in the mental lexicon (e.g., Baayen & Smolka 2019; Ramscar et al. 2018). In addition, phenomena such as the shared *s* in *Fantas-steak* suggest that an approach to parsing that requires that reference be made to fixed individual morphemes and individual letters in a word is likely to be problematic. A more useful approach to capturing how individuals identify constituent substrings of English words can be to simply posit a heuristic of Fuzzy Forward Lexical Activation. In this approach, processing always takes place from beginning to end. Initial letters of a word are scanned until a familiar initial lexical substring is encountered. If it is, a final substring is computed from that position onward. If that final substring is also familiar, it is interpreted and the process continues. Thus, the strings *formality*, *clampeel*, *Fantasteak*, and *unlockable* would be processed in the manner shown in Table 1.

This parsing heuristic makes the claim that processing activity will generate patterns that correspond to both readings of a novel ambiguous word such as

### 4 What can we learn from novel compounds?

Table 1: Fuzzy Forward Lexical Activation for stimuli such as *formality, clampeel, Fantsteak*, and *unlockable*.


*clampeel*, as well as an existing structurally ambiguous word such as *unlockable*, simply by parsing them.

In addition, the parsing heuristic will generate both stem-suffix representations for the word *formality*, as well as an affix string representation. It will, however, neither generate the fully decomposed representation *form-al-ity* nor the fully decomposed representation *un-lock-able.* The heuristic therefore makes the empirical claim that English language users do not create such fully decomposed representations either. They are thus claimed to be potential lexical superstate representations that are not realized because of the dynamics of visual lexical processing in English.

Fuzzy Forward Lexical Activation is likely the simplest possible approach to English visual morphological parsing. Like the signs that one sees on London crosswalks to aid tourists, it says: "Look right →". By beginning at the beginning and looking right it ensures that the key initializing activity in morphological processing is the activation of the initial substring of the word. Thus, although this approach to morphological processing differs from the prefix-stripping approach of Taft & Forster (1975), it has much the same effect. The processing of a word such as *unlockable* begins with the recognition of the prefix *un-*. From there, the parses *un-lockable*, *unlock-able* are created (under the assumption that the substrings *lockable*, *unlock* and *-able* are known to the language user). This feature of checking that substring to the right is known to the language user ensures that the parse *form-ality* is possible under the assumption that a language user maintains a trace of suffix strings such as *-ality* (Derwing 2014; Libben et al. 2016). However, the potential parse *for-mality* would fail at the "look right" stage because the string *-mality* is unlikely to be known to the language user as a representation of English.

Fuzzy Forward Lexical Activation constituted an extremely simple approach to morphological parsing that, I claim, is linked directly to the lexical superstate

### Gary Libben

representations shown in Figures 1, 2 and 3. Indeed, it creates them. Its functioning results in morphological processing that is primarily binary. The reason for this is that it must begin at the beginning and it must look toward the end of the word. The functioning of Fuzzy Forward Lexical Activation also results in what might be termed hierarchical morphological structure, so that a word such as *undrinkable* would be parsed as the right-branching structure *un-drinkable,* whereas a word such as *unlockable* will be parsed as both the right-branching structure *un-lockable* and the left-branching structure *unlock-able*.

## **1.5 Typing as a window to morphological processing**

This brings us to the question how the predictions of the lexical superstate hypothesis and the proposed mechanisms of Fuzzy Forward Lexical Activation can be evaluated. A potentially revealing task is one that specifically targets lexical activation in left-to-right processing. I suggest that the online typing of words is exactly such a task. In online typing, a participant is presented with a lexical string and is asked to type it as quickly and as accurately as possible. For each word typed, it is possible to calculate overall per letter typing times as well as per letter typing times at specific locations in the word (Feldman et al. 2019; Libben et al. 2016; Sahel et al. 2008; Will et al. 2006).

If indeed, morphological structure for novel compounds is evident in online typing, we should see elevated response times at the morpheme boundary for unambiguous strings such as *anklecob.* This would correspond to the location at which participants recognize an initial string *ankle* and then would look right to the end of the string, recognizing *cob*. For ambiguous novel compounds such as *clampeel*, however, the location of the putative morpheme boundary should be blurred and both potential parses should become part of the lexical superstate. We would expect elevated letter typing times at both the locations between *clam* and *peel*, as well as between *clamp* and *eel*. Our previous research has shown that the typing of morphologically complex words is characterized by elevated typing times at the constituent boundary (Libben et al. 2014; 2016). This difference in typing time may reflect morphological chunking in letter typing, so that a two constituent compound word is typed as a sequence of two motor plans. Each motor plan would correspond to a compound constituent. The prediction regarding typing times for ambiguous novel compounds follows from this observation: If the production of ambiguous novel compounds involves the activation of all potential constituents, then we should expect that four motor plans are in play. This would result in "blurring", i.e., longer and lower latency spikes. Latency increases would be longer because they would be spread over two letter boundaries rather

4 What can we learn from novel compounds?

than one and they would be lower because each of those letter boundaries is at once a "between constituent" location and a "within-constituent" location.

Thinking about constituent boundary effects in terms of motor plans suggests that a number of control variables also need to be tracked. The reason for this is that one would expect that the speed with which word typing motor plans are created and executed can be influenced by position in the word and by the frequency of particular letter combinations in the language. Moreover, particular attention would need to be paid to letter co-occurrence frequencies at the constituent boundary itself. These predictions and analytic considerations were tested in the experiment described below.

## **2 Method**

## **2.1 Participants**

Twenty-four native English speakers between the age of 17 and 26 years participated in the study. All reported English to be their mother tongue and none had learned a second language before age ten. All participants had normal or corrected-to-normal vision. They were all university students from a variety of departments of Brock University who received either course credit or \$15 for their participation.

## **2.2 Stimuli**

In total, participants viewed and produced 45 stimuli, all of which were novel noun-noun English compounds. Fifteen of these were unambiguous novel compounds, thirty were ambiguous novel compounds. The ambiguous novel compound stimuli were created by extracting all nouns from the CELEX database (Baayen et al. 1995) and then identifying which of those also created nouns when their last letter was removed. This created a candidate first constituent pair (e.g. *clam*, *clamp*). Each such pair was then linked to a noun in the CELEX database that began with the last letter of the longer member of the pair (in this case, *p*) and which also created a noun when its first letter is removed (e.g., *peel*, *eel*). This process of selection creates the set of English novel noun-noun compound stimuli such as *clampeel*. The resulting set of 45 stimuli are shown in Table 2. These were created so that constituents were comparable in frequency and length. The set of ambiguous novel compounds was subdivided into those in which the grapheme-to-phoneme relations were different in each of the two parses and those for which the grapheme-to-phoneme relations were essentially

### Gary Libben

the same. The difference between these two subgroups can be easily appreciated by reading aloud the stimuli in the middle column of Table 2 and reading aloud those in the final column of the table. The ambiguous stimuli with sound change shown in the middle column are those such as *babelarch*. As *babe-larch*, the first constituent is one syllable in length and has the initial vowel /ej/. As *babel-arch*, the first constituent is two syllables in length and has the initial vowel /æ/. In contrast, *clampeel*, as the first stimulus in the third column of Table 2, has essentially the same phonological realization as *clam-peel* and *clamp-eel* (assuming co-articulation and other effects associated with the morpheme boundary).

Table 2: The novel compound stimuli in the progressive demasking and typing tasks. Unambiguous stimuli (e.g., *anklecob*) have a single morphological parse (e.g., *ankle* + *cob*). Ambiguous stimuli (e.g., *clampeel, babelarch)* have two possible morphological parses. For the ones with sound change, the pronunciation of graphemes depends on the parse (e.g., *babe* + *larch*, *babel* + *arch*). For ambiguous stimuli without sound change, it does not (e.g., *clam* + *peel*, *clamp* + *eel*).


4 What can we learn from novel compounds?

## **3 Apparatus and procedure**

The procedure employed in the present study was implemented in Psyscope X, running on a MacBook Pro, using an IO Labs button box Voice Key.

We used a combined progressive demasking and typing paradigm as developed by Libben et al. (2012) and Libben et al. (2014). In this paradigm, participants first see a word being progressively demasked in the center of a computer screen. and must identify it as quickly as possible. After the word is identified (either by saying it aloud or by pressing the return key), the stimulus disappears. The participant is then asked to type it as quickly and as accurately as possible.

All 24 participants identified and typed all 45 stimulus words. Stimuli were presented in a different random order for each participant. Testing was conducted in a single block of trial and the main experiment was preceded by a practice session of six trails.

Each trial consisted of two components: a progressive demasking component and a typing component, with an inter-trial interval of two seconds. Thus, in the first trial of the experiment, a participant would see word being progressively demasked and would identify it. This progressive demasking component would be immediately followed by the typing component. The screen would go blank and the participant would type the word using the keyboard of the MacBook Pro. Participants pressed the return key after they had typed the last letter of the stimulus word. That action initiated the appearance of the target stimulus in the center of the string. Participants were asked to verify that this was in fact the stimulus that they saw by pressing a key marked "yes" or "no" on the keyboard (all participants responded "yes" to all words). Their pressing of the "yes" key ended the trial. The screen then went blank for 2 seconds, after with the progressive demasking component of the next trial began. The details of each of two the trial components are presented below.

## **3.1 The progressive demasking component**

The key feature of the progressive demasking is that words appear very slowly as though they were emerging from a fog. This effect was created in this experiment by alternating the presentation of a stimulus word and a pattern mask of cross hatches (##########) over 18 cycles. Each cycle was 300 ms in length. In the first cycle, the target word is presented for only 16 ms and the mask is presented for 284 ms. In the next cycle, the target is presented for 16 ms longer than in the previous cycle and the mask is presented for 16 ms less (i.e., 32 ms and 268 ms respectively), Thus, in each successive cycle, the target word becomes more visible. Typically, for real and novel compound words, the stimulus word is identified in under 10 cycles (3,000 ms). In the present study, participants were randomly assigned to one of two progressive demasking procedures. The difference between the groups was that Group 1 participants were asked to press the keyboard return key as soon as they could identify the word. Group 2 participants were asked to say the word aloud as soon as they could identify it. These two groups were created in order to test whether saying the stimuli aloud in the progressive demasking task would have the effect of disambiguating the ambiguous stimuli in the subsequent typing task.

## **3.2 The typing component**

For each stimulus, the typing component of the task began immediately following the participant's identification of the progressively demasked stimulus. Typing was done on a standard laptop keyboard, and the letters that the participant typed were visible on the screen (as is the case in normal typing). Participants were able to self-correct during word typing by pressing the backspace key. As soon as they finished typing the word, they pressed the return key. This ended the typing component of the trial.

## **4 Results**

Our analysis focused on trials in which stimuli were typed without error (i.e., the word produced was that which was presented and was typed without the backspace key having been pressed). The overall accuracy rate, defined in this way, was 79%. The progressive demasking response latencies and letter typing times for correctly typed stimuli were analyzed using linear mixed effects models in R.

## **4.1 Progressive demasking**

The analysis of progressive demasking latencies in a generalized linear mixed effects model did not yield significant effects of recognition latency differences related to whether the stimulus was ambiguous, whether the ambiguity resulted in a pronunciation change, or whether participants responded by saying the word aloud or by pressing the return key. There was, however, a facilitating effect of the frequency of the initial substring of the stimulus ( < 0.001). This is consistent with the expectation that Fuzzy Forward Lexical Activation is driven by the familiarity of an initial lexical substring. No other significant lexical frequency

4 What can we learn from novel compounds?

effects were observed in the progressive demasking task or in the typing task (e.g., for a stimulus such as *clampeel*, the frequency of *clam* had an effect, but the frequencies of*clamp*, *peel*, or *eel* did not). Lexical frequency values were obtained from the CELEX English lemma databases (Baayen et al. 1995).

## **4.2 Typing**

In the analysis of typing times, the random effects included participant, stimulus and letter typed. This last random effect was included to capture the influences of factors that could be associated with the typing of a particular letter on a keyboard. These may include whether it is a consonant or a vowel, whether it is typed with the right hand or left hand, the index finger or some other finger, etc.

The two key fixed effects in the model were the position of the letter with respect to the constituent boundary and stimulus ambiguity (ambiguous vs. nonambiguous). For the variable "letter location around boundary", four locations were targeted:


This factor was investigated as a fixed effect and in terms of its interaction with stimulus ambiguity.

There was no effect or interaction associated with whether the participant responded in the progressive demasking task by pressing the return key or by saying the stimulus aloud ( > 0.1). This variable was removed from the model and the participants were treated as a single group.

The analysis began with the two key factors above. To this, a number of control factors were added. These included trial ( = 0.003), which indicated that participants' typing got faster as they progressed through the experiment. The variable "position within the word" was also added. This variable, which was marginally significant ( = 0.049), showed a tendency for participants to be somewhat slower at later points in the word. The inclusion of this factor improved the model and acted as a control for the fact that the first constituent

### Gary Libben

boundary for some stimuli (e.g.,*clampeel*) was at letter 5 of the stimulus, whereas, for others, (e.g., *damplane*) it was at letter 4. Stimuli with constituent boundaries later in the string were associated with slower typing times.

The four additional control variables that were added to the model all concern the frequency of letter sequences. All improved model performance. The first was the overall bigram frequency of the stimulus string, obtained from the English Lexicon Project (Balota et al. 2007). The second was the frequency of the letter being typed in combination with its preceding letter. The third was the frequency of the letter being typed in combination with its following letter. Finally, the fourth variable was the frequency of the two-letter sequence at the first constituent boundary. Whereas, in the first three bigram frequency measures, higher frequency was associated with faster typing times, the opposite was the case for bigram frequency at the constituent boundary. Here, higher bigram frequency slowed typing times. This observation is consistent with the view that typing involves chunking by constituent and therefore, high frequency bigram transitions that could potentially disrupt the segmentation of the novel compound into constituents are disruptive.

The results of typing times showed effects of bigram frequency, location with respect to constituent boundaries, and ambiguity. The analysis of these data patterns is presented in Table 3 and Figure 4.

As can be seen in Figure 4, the non-ambiguous stimuli (e.g., *anklecob*) show a clear typing pause at the constituent boundary. That is, at the point at which they type the first letter of the second constituent (e.g., the *c* in *anklecob*). Letter typing times for the following letter drop considerably to below 200 milliseconds immediately following that letter. In contrast, that same position shows typing times in the 250 millisecond range for ambiguous stimuli. The key difference is that, for ambiguous stimuli, that position (e.g., the *e* in *clampeel*), is at once a constituent boundary in the reading *clamp-eel* and the second letter in the reading *clam-peel*. This dual status seems to be reflected in the per letter typing times shown in Figure 4.

The notion of dual status accords with the view that lexical superstates characterize ambiguous strings such as *clampeel*. The data obtained through this experiment also seem to suggest that lexical superstates remain intact even when they could have been disambiguated as a result of reading aloud. The data showed no interaction of response type (return key vs. reading aloud) with stimulus type (unambiguous vs. ambiguous with sound change vs. ambiguous without sound change).



Gary Libben

Figure 4: Per letter typing times for ambiguous novel compounds (e.g., *clampeel*) and non-ambiguous novel compounds (e.g., *anklecob*).

## **5 General discussion**

This chapter has focused on English novel compounds as words that are themselves multiword expressions. I have claimed that the investigation of these structures can advance our understanding of morphological processing in general and the parsing of multiword lexical strings, in particular. In that context, ambiguous novel compounds such as *clampeel* may have a special role to play. Because they are ambiguous (e.g., can be parsed into *clam-peel* or *clamp-eel*) they enable us to investigate whether lexical processing results in the activation of one structure or all possible structures. The prediction for these stimuli, in accordance with previous research by Libben et al. (1999), was that we should see evidence of the activation of all potential constituents of ambiguous novel compounds in a word typing task. This prediction is based on the claim that a core property of lexical representations is that they are shaped by patterns of lexical activity and they are commonly in a lexical superstate, rather than in any particular morphological configuration (Libben 2019).

An experiment was reported in which 24 participants each saw 45 novel compounds as progressively demasked stimuli and were required to type each of these as quickly and as accurately as possible. Typing times for each letter were recorded.

## **5.1 Lexical superstates**

The typing data are consistent with the lexical superstate hypothesis. Whereas the non-ambiguous novel compounds such as *anklecob* showed a sharp spike in letter typing times at the location between the two compound constituents, the ambiguous ones (e.g., *clampeel*) showed more moderately elevated letter typing times at both putative inter-constituent locations (e.g., between *clam* and *peel* and between *clamp* and *eel*). This pattern of results is consistent with the view that such ambiguous words are in a lexical superstate so that the language user can employ the most appropriate interpretation of the string, depending on the situation. I would argue that this phenomenon of lexical superstates is particularly easily seen in the case of ambiguous novel compounds but, in fact, is present in all putatively multimorphemic words. All such existing words are, by definition, structurally ambiguous. The simple reason for this is that they can at once have decomposed and undecomposed interpretations. Again, lexical superstates allow the language user to employ whichever of these is most appropriate or most needed under particular circumstances.

An additional reason why the investigation of ambiguous novel compounds can be revealing of the underlying principles of lexical processing is that they constitute, by their nature, a controlled experiment. They do not have existing whole word memory traces. So, when a participant encounters a novel compound, they must create an interpretation in real time. This interpretation can only be created with reference to possible internal constituents. Thus, these compounds provide us the controlled conditions under which we can investigate how lexical substrings are identified and how putative constituents are created.

## **5.2 Action-based sublexical structure**

If indeed, as I propose, morphological structure arises from lexical activity and words are more properly considered to be actions rather than things, an actionbased account of how morphology comes about is required. I propose Fuzzy Forward Lexical Activation as such an account. Fuzzy Forward Lexical Activation has a maximally simple functional architecture. It claims that visual lexical processing in English proceeds from beginning to end and that, as soon as an initial lexical substring is identified, the system "looks right" to the end of the string for a possible concluding lexical substring. It then continues in a left-to-right manner so that any possible initial substrings will be longer and any possible final substrings will be shorter. In this way, the heuristic only creates initial and final substrings (i.e., those that start at the beginning of the string and those that end

### Gary Libben

at the end of the string, respectively). All internal structures, therefore, will be binary. Importantly, however, these binary structures will be overlapping for all multi-constituent strings. These overlaps, created through lexical activity, constitute the structural lexical superstates for the words that are shown in Table 1.

Thus, I claim that Fuzzy Forward Lexical Activation offers a simple mechanism for the activation of sublexical elements of a word. It renders hierarchical structure epiphenomenal, but at the same time offers an explanation for why English language users have multiple interpretation for ambiguous stimuli and left-branching and right-branching interpretations for words such as *unlockable*.

## **5.3 Action based lexical development is situation specific**

It is important to note that the approach to sublexical structure discussed here is, by definition, linked to the specific experience that a language user has with language processing and the specific conditions under which language processing is taking place at the time of measurement. Thus, for example, in this study, we did not observe a point at which overlaying alternative parses of ambiguous novel compounds are collapsed. It was expected that this might be observable by inspecting the interaction of response type (keypress vs. word naming) and type of ambiguous novel compounds (with sound change vs. without sound change). The reasoning behind this was that, in the word naming task, a choice between parsing alternatives would have to be made for stimuli such as *babelarch*, which have different pronunciations, depending on the parsing choice. The fact that this interaction was not observed may be related to the specific conditions of the experiment (e.g., the high density of ambiguous structures or perhaps the ability of participants to "reset" between the recognition and production components of each trial).

In addition to exploring the effects of varying task demands using stimuli of this sort, it would be valuable to investigate language demands. It seems reasonable to expect that the "look-right to the end" feature of Fuzzy Forward Lexical Activation is developed as English language users adapt to the demands and opportunities created by the English writing system. It is very likely that this is a language-specific adaptation. For German, for example, it might be expected that language users might not create final substrings that must reach to the end of the word. The reason for this is that German has unspaced tri-constituent (and longer) compounds that the English writing system does not allow. Considerations such as these enhance the probability, in my view, that the conclusions we draw concerning language processing have enhanced ecological validity. If we

4 What can we learn from novel compounds?

accept the view that words are patterns of action, rather than static representations, then we must also expect that their psycholinguistic instantiations will correspond with individual variability in language experience.

## **Acknowledgments**

The development of this chapter was supported by the Social Sciences and Humanities Research Council of Canada Partnership Grant 895-2016-1008 "Words in the World".

## **References**


### Gary Libben


4 What can we learn from novel compounds?


## **Chapter 5**

## **Internal constituent variability and semantic transparency in N Prep N constructions in Romance languages**

## Inga Hennecke

University of Tübingen

Constructions of the type N Prep N represent one of the most controversial issues in Romance word formation. In particular, their lexical status and their degree of productivity are still crucial points of discussion. Hence, it remains unclear whether these constructions fall within the category of morphological word formation or of syntax. Furthermore, the possibilities for internal prepositional variation remain uncertain. This article takes a constructionist approach within the framework of construction morphology in order to describe the internal constituent variability and transparency of the prepositional element in N Prep N constructions in Spanish, Portuguese, and French, as in Sp. *juego de niños*, *juego para niños* ('kid's game') or in Sp. *cabaña de árbol* and *cabaña en árbol* ('tree house'). A qualitative analysis of large-scale corpus data from the TenTen corpus family indicates that Romance N Prep N constructions may undergo internal prepositional variation. The analysis focuses on the semantic relations of the internal nominal constituents and the semantic transparency of the constructions in the three Romance languages under investigation. The results indicate that semantic relations and semantic transparency play a role in the internal constituent variability of the prepositional element.

## **1 Introduction**

Compounds of the type N Prep N, such as Sp. *bicicleta de montaña* 'mountain bike', Fr. *salle de bain* ('bath room'), or Pt. *história em quadrinhos* ('comic strip'), are generally considered to be the most problematic aspect of research on compounding and word formation in Romance languages. This is because these constructions represent nominal lexical units that clearly approach free syntactic

### Inga Hennecke

structures (de Bustos Gisbert 1986). Compounds of the type N Prep N have been treated very differently in research on compounding and have also been labeled with many different terms, such as syntagmatic compounds (Buenafuentes de la Mata 2010), syntactic compounds (Rio-Torto & Ribeiro 2009), improper compounds (Kornfeld 2009), phrasal lexemes (Masini & Thornton 2007), frozen multiword units (Guevara 2012), lexicalized syntactic constructions (Villoing 2012), lexicalized phrases (Fradin 2009), and syntactic words (Di Sciullo & Williams 1987). Generally, compounding is a mechanism whereby two lexical units are combined. Compounds of the type N Prep N are characterized as lexical units that consist of (at least) two lexical elements that are not orthographically combined. As a result, compounds of the type N Prep N, such as Sp. *traje de baño* ('bathing suit'), do not differ on a formal level from syntactic phrases of the type N Prep N, such as Sp. *libro para niños* ('book for children') (de Bustos Gisbert 1986: 69).

The most problematic issue in current research on compounding of the type N Prep N is the question of the delimitation of syntactic and lexical structures in Romance languages. As the treatment of these constructions is based largely on the theoretical background of the individual author, there is no general agreement on whether or not N Prep N constructions should be included in the class of compounds. Related to this issue is the question of whether these constructions emerge by means of productive word formation processes or are merely "fossilized" or lexicalized syntactic structures. These two crucial issues will be discussed and analyzed in this study, with a focus on one particular case of internal constituent variability, the alternation of the internal preposition in N Prep N constructions. A large-scale corpus analysis of this alternation in French, Spanish, and Portuguese supports the adoption of a constructionist approach within a framework of construction morphology. Such an approach allows the internal constituent variation of N Prep N constructions to be represented without recourse to traditional notions of lexicon and syntax.

## **2 Definition and classification of syntagmatic compounds**

As mentioned above, constructions of the type N Prep N are often excluded from descriptions of Romance compounding. Typically, they are classified together with other compound-like constructions lacking an orthographical union, as in the examples in Table 1.

According to Masini, these examples are separated orthographically, show no strong degree of idiomaticity, and appear quite frequently in each of the four

### 5 Internal constituent variability and semantic transparency


Table 1: Phrasal lexemes in Romance languages (Masini 2009: 257)

languages. The question nevertheless remains whether these constructions form part of the class of compounds.

According to Guevara (2012), Spanish syntagmatic compounds, such as *fin de semana* ('weekend') or *sabelotodo* ('know-it-all'), should be excluded from the class of Spanish compounds, as these units are clearly syntactic units that contain "certain effects of lexicalization and atomicity in their distribution" (Guevara 2012: 180). In the same way, Villoing (2012) excludes French constructions such as *fil de fer* ('iron wire') and *brosse à dents* ('tooth brush') from her description of French compounds, as they are "lexicalized syntactic constructions that behave like lexical units" (Villoing 2012: 35). The approach taken by Guevara and Villoing indicates, on the one hand, that constructions of the type N Prep N are often considered as syntactic units that lie outside of the core of word formation processes. For this reason, they are regularly neglected in research papers on Romance word formation. On the other hand, this approach shows that N Prep N constructions are frequently interpreted as lexicalized syntactic constructions and, more precisely, as syntactic constructions that have somehow attained a high degree of fixedness. If this is the case, they should also be excluded as belonging to the class of Romance-language compounds, as lexicalization cannot be considered a morphological word formation process.

There is an opposing perspective according to which the constructions mentioned in Table 1 constitute a productive type of word formation and clearly follow productive morphosyntactic rules. According to Rainer, constructions of the type N Prep N are "very productive lexical patterns, which normally continue to obey the rules of […] syntax (for example, agreement rules), but may occasionally also deviate from them" (Rainer 2016: 2724). This perspective is not new and was already adopted by Benveniste (1974) in his work on French compounds of the type *robe de chambre* ('robe') and *plat à barbe* ('shaving bowl'), for which he claims indefinite productivity (Benveniste 1974: 172). In the course of the present paper, I will provide new empirical evidence in favor of this perspective using

### Inga Hennecke

large-scale corpus data. The analysis will show that N Prep N constructions in Romance languages are highly frequent and productive and that their internal variability follows clear morphological rules that can be mapped using construction morphology.

In order to distinguish N Prep N constructions from other phrase-like constructions, their characteristics must be clearly delineated. According to Buenafuentes de la Mata (2010), a syntagmatic compound may be defined as a lexical element that has been created by the fixation of a syntagm, which keeps its sentential structure, and therefore shows neither orthographic nor accentual union (Buenafuentes de la Mata 2010: 21ff.). De Bustos Gisbert (1986) states that Spanish N Prep N compounds differ from syntactic units on the syntactic level in two respects. First, they have a fixed word order, for example, *ojo de buey* ('porthole') cannot be reordered as \**buey ojo de*. Second, there is generally no unproblematic substitution of their constituents; for example \**ojo de vaca* ('eye of cow') (Val Àlvaro 1999: 4825). On a morphological level, he adds that N Prep N constructions show the same characteristics as other compounds in terms of gender and number agreement, the presence of composition markers, and the ability to undergo further derivation and to form collocations (de Bustos Gisbert 1986: 77). According to Masini, N Prep N constructions are of major interest, as they follow the syntactic rules of head modification of a nominal phrase by a prepositional phrase. This means that in Romance languages, N Prep N constructions are generally left-headed, and that inflectional processes are performed on the head of the construction (Masini 2009: 257). Val Àlvaro (1999: 4827) adds a fundamental characteristic on the semantic level: the absence of compositional meaning that may lead to syntactic reinterpretation of the complex nouns. This means that syntagmatic compounds, in contrast to syntactic units, represent one single naming unit at the semantic level; that is, they refer to one specific conceptual representation, as in Fr. *sac à main* ('purse').

In this paper, I will focus on the syntactic criteria given by de Bustos Gisbert, specifically, on the impossibility of constituent substitution. This criterion does not appear to be suitable for purposes of differentiating syntactic and lexical elements, as the delimitation between syntactic and lexical N Prep N constructions remains a matter of controversy. Here, I will show that variation of the internal preposition can be best explained within a constructional framework. I will then argue with regard to the internal preposition that not only is the substitution of internal constituents possible in N Prep N constructions, but it is also a rule-governed process and depends largely on semantic factors, particularly the semantic relation of the nominal constituents.

### 5 Internal constituent variability and semantic transparency

When investigating the semantic relations of constituents of N Prep N constructions, it is crucial to consider the notions of semantic transparency and semantic opacity. In current research, the term semantic transparency refers to the degree to which the meaning of a complex construction can be derived from the meaning of its constituents (Zwitserlood 1994). For example, the French N Prep N construction *salle de bains* ('bathroom') is considered semantically transparent, whereas the Spanish construction *ojo de buey* ('porthole', lit. 'bull's eye') is considered semantically opaque. Bell and Schäfer view semantic transparency and semantic opacity as scalar notions, lying at either end of a continuum (see Bell & Schäfer (2016) for a detailed discussion on semantic transparency). Later in the present study, I will discuss whether the semantic transparency of an N Prep N construction determines the possibility of internal constituent variation.

## **3 Internal constituent variation in N Prep N constructions: The role of the preposition**

Characteristic of N Prep N constructions, and a crucial factor in their delimitation, is their resistance to paradigmatic variation. In the context of the delimitation of nominal compounds and noun phrases of the type N Prep N in Portuguese, Rio-Torto & Ribeiro (2012: 9) state that the "(im)possibility of lexical insertion" is one of the most important tests of compoundhood. They go further, claiming that if internal changing is allowed, "we are no longer dealing with compounds ([N[PrepN]]N) but with noun phrases ([N[PrepN]]NP)" (ibid.). When speaking of internal change, Rio-Torto and Ribeiro refer principally to changes in determination, as in Pt. *fim de semana* ('weekend') and Pt. *fim da semana* ('end of this week'), and to changes effected through insertion of lexical material, as in *fim da última semana* ('end of last week'). As these examples suggest, internal constituent variation is generally seen as a crucial test of delimitation between compounds and syntactic structures. Similarly, Masini argues that, for lexical elements, "paradigmatic variation is blocked, since the words in the construction cannot be substituted by a near-synonym, which should not be a problem for normal phrases" (Masini 2009: 259). Masini defines paradigmatic blocking as the inability to replace a constituent of the construction by another paradigmatically fitting constituent. She also refers to cases of paradigmatic blocking of a nominal unit of a N Prep N construction, as the Italian examples *casa di cura* ('nursing home') and \**abitazione di cura* ('\*nursing domicile'). In this case, *casa di cura* is a fixed naming unit that loses its semantic meaning when there is paradigmatic variation of a nominal element. The following analysis will show

### Inga Hennecke

that paradigmatic blocking holds particularly true for N Prep N constructions with a stronger degree of semantic opacity and idiomaticity. More transparent N Prep N constructions allow productive and rule-governed internal constituent alternation, as the analysis will show by means of the prepositional constituent.

In the literature, all references to a delimitation test of constituent variability neglect the prepositional constituent in N Prep N constructions. The prepositional element is fundamental in N Prep N constructions, but its status is far from clear. In all the Romance languages under investigation here, the preposition *de* is the most frequently used prepositional constituent in N Prep N constructions. In the case of Spanish, Buenafuentes de la Mata (2010) cites various examples of N Prep N constructions with prepositions other than *de*, such as *leche en polvo* ('milk powder'), *cita a ciegas* ('blind date'), *caridad con uñas* ('self-serving favor'), *pozo sin fondo* ('bottomless pit'), and *caballo con arcos* ('pommel horse'). She adduces the appearance of prepositions other than *de* as evidence for the structural complexity of N Prep N constructions in Spanish. The same case can be made for the other languages under investigation in this paper (i.e. French and Portuguese), which show the same ability to form N Prep N constructions with other prepositions.

This paper concentrates on a specific set of (partially) synonymous prepositions in French, Spanish, and Portuguese, which are Fr. *de* ('of'), *à* ('to'), *en* ('in'), and *pour* 'for'; Sp. *de* ('of'), *a* ('to'), *en* ('in'), and *para* ('for') as well as Pt. *de* ('of'), *a* ('to'), *em* ('in') and *para* ('for'). These prepositions may all appear in N Prep N constructions and they may all undergo internal alternation and variation in the three languages under investigation. Consider examples (1–3) from the TenTen corpora:

	- b. Pt. *água de lavagem água para lavagem* 'wash water'
	- c. Fr. *livre d'enfant livre pour enfants* 'children's book'
	- b. Fr. *jauge d'essence jauge à essence* 'fuel gauge'
	- c. Pt. *fogão de lenha fogão a lenha* 'wood stove'
	- b. Pt. *bracelete de aço bracelete em aço* 'steel bracelet'
	- c. Sp. *ciclismo de pisto ciclismo en pisto* 'track cycling'

Example (1) shows internal variation of the prepositional elements *de* and *pour/para*. While the constructions containing *de* are considered to have a lexical

### 5 Internal constituent variability and semantic transparency

status, the constructions with *pour/para* are generally considered to be syntactic constructions, as they pass certain of the classification tests mentioned above. In contrast to the construction with *de*, they allow substitution and insertion, as the two tests of compoundhood demonstrate: Sp. *fuentes de vidrio para horno* ('glass casseroles for the oven'), *fuentes profundas para horno* ('deep casseroles for the oven'), but \**fuentes de vidrio de horno* and \**fuentes profundas de horno*. Example (3b) demonstrates the internal alternation of the prepositions *de* and *en/em*. Here, alternation is possible without changing the semantic context of the whole construction or its degree of semantic transparency. In example (2), the prepositions *de* and *a/à* alternate in N Prep N constructions without changing the lexical status of the respective constructions. Nonetheless, these constructions differ in their frequency of usage, productivity, and fixedness, as well as in their degree of lexicalization and of idiomaticity. Especially in French, alternation of *de* and *à* may indicate a change in meaning, as in *verre de vin* ('glass of wine') and *verre à vin* ('wine glass'). In this case, the interpretation of both constructions as two distinct products of word formation is reasonable (this specific case will be discussed in detail in the course of the corpus analysis). In other cases, such as Pt. *fogão de lenha – fogão a lenha* ('wood stove'), no clear semantic difference is visible, as attested by native speakers of Brazilian and European Portuguese: internal variation is possible inside one construction (a more detailed discussion of the examples will follow in the upcoming section). As mentioned above, authors including Rio-Torto & Ribeiro (2009) interpret constructions of the type shown in examples (1–3) as syntactic units, on the grounds that they do not pass all the delimitation tests for compoundhood. The following theoretical discussion and empirical analysis will show that it is neither necessary nor possible to draw a clear distinction between syntactic constructions and lexical constructions; the possibility of alternating prepositional elements in N Prep N constructions depends largely on the semantic function of the N2, and the fixedness, semantic transparency and the idiomaticity of the whole construction.

Another problem in analyzing alternation of prepositional elements concerns the role of the prepositions. In Romance languages, the prepositions *de* and *à/a* in particular have often been considered as semantically "empty" units that do not contain meaning. This perspective has often been applied to the prepositional element in N Prep N constructions: for example, Bartning (1993: 164) states that prepositions in French N Prep N constructions do not code any specific meaning and that they function only as linking elements. Similarly, Bartning (1993) refers to Cadiot (1997) by describing these prepositional elements as "colorless prepositions" (Bartning 1993: 164). For the French prepositions *de* and *à*, Bosredon & Tamba (1991: 44) use the term of "opérateur de couplage" ('linking operator'). Ca-

### Inga Hennecke

diot (1997) sees prepositions in N Prep N constructions as elements that express the operation of a construction or the denomination of a subclass of N1; he associates the prepositional element with a "referential calibration" of N1. In contrast, Laumann (1998: 32) notes the importance of distinguishing between different types of meaning when investigating the function and meaning of prepositions in nominal compounds. He differentiates between system meaning (*Systembedeutung* – the sum of the meaning patterns of the constituents in the N Prep N construction), word meaning (*Wortbedeutung* – the meaning of the construction on the level of word formation), and lexicon meaning (*Wortschatzbedeutung* - the meaning of the construction as a naming unit in the lexicon).

Other authors interpret the possibility of elision of the prepositional element, as Sp. *ducha de teléfono* > *ducha teléfono* ('detachable shower head') or Sp. *crédito de vivienda* > *crédito vivienda* ('home loan'), as evidence of the semantic emptiness of the preposition. However, the elision of the prepositional element is only possible in certain strongly lexicalized constructions. Therefore, a counterargument may be based on the same evidence, given that the elision of the prepositional element is not possible in most cases. In the present paper, I argue that the elision of prepositional elements is not proof of a lack of semantic content. The elision can be explained in terms of common processes of language change that may or may not take place in certain lexicalization processes within complex units. The alternation between prepositional elements, as exemplified above is a productive word-formation process that differs clearly from mere lexicalization processes. Therefore, the following qualitative corpus analysis considers the internal constituent alternation of the prepositional element from a comparative perspective and does not focus on the elision of this element. The analysis adopts a constructionist approach based on Goldberg (1995; 2006), with a special focus on construction morphology as introduced by Booij (2010; 2015).

## **4 N Prep N constructions in construction grammar and morphology**

Since Goldberg's seminal work *Constructions* (1995), the constructional approach has had a strong impact on linguistic research. Constructions are considered as conventionalized form-meaning pairs that can be found at all levels of abstraction in language, are dynamically formed, and may be changed continuously. They are acquired via general processes of abstraction, generalization, and categorization. Goldberg (2006: 5) considers any linguistic unit to be a construction,

### 5 Internal constituent variability and semantic transparency

if "some aspect of its form or function is not strictly predictable from its component parts". Furthermore, units are considered to be stored as constructions if they can be fully predicted and if they are sufficiently frequent (ibid.).

In his theory of construction morphology, Booij (2015) applied the general notions and concepts of construction grammar to morphological units that have traditionally been regarded as morphological. The underlying assumption of the theory of construction morphology is that a construction may have characteristics that cannot be derived from their constituents (Booij 2015: 3). Booij cites the example of the reduplication of nouns in Spanish in order to express the notion 'real', as in *un café café* ('a real coffee'). He establishes the notion of conceptual schemas and subschemas, defined as schematic representations of morphological constructions. These schemas represent a correlation between form and meaning:

(4) (Booij 2015: 2) <[[x]V<sup>i</sup> er]N<sup>j</sup> ↔ [Agent of SEM<sup>i</sup> ]j>

This example indicates that a word with base x, in this case an English infinitive verb form, can transform into a noun with the meaning 'agent of the base word (SEM)' by adding the suffix *-er* (Booij 2015: 2). The variable x denotes the phonological content of the base word, i denotes the meaning of the base word, and j shows that the meaning of the complete construction depends on the form of the complete construction (ibid.). Masini (2009: 261) applies the theory of construction morphology to constructions of the type N Prep N in Italian, taking them to represent an abstract template that is stored in the mental lexicon. Masini further notes that this abstract template features a certain degree of productivity and is associated with a concrete naming function (ibid.). By means of a specific inheritance mechanism, based on instance inheritance links (Goldberg 1995), constructions that are more and more specific can be derived from the abstract template. This may be done by categorical specification (filling an unspecified slot with a specific category), as in N Prep N or N Prep V, by lexical specification (filling a slot with specific lexical material), as in N *de* N or by a completely lexical construction, such as It. *casa di cura* (Masini 2009: 261). Figure 1 demonstrates the application of this theory to French constructions of the type N Prep N.

This figure shows the inheritance hierarchy from the abstract template [N1 *de* N2]N, which here is an intermediate construction of the abstract template [N1 Prep Y] and [N1 Prep N2]. From the level [N1 *de* N2]N, it is possible to proceed to a second intermediate lexical level, which indicates the semantic function of N2, Inga Hennecke

Figure 1: Inheritance hierarchy for N Prep N templates in French (Masini 2009: 263)

and to conclude at a completely lexical level, which shows the lexical result with a concrete naming function. According to Masini (2009: 263), this model can also clarify and describe new occurrences of the N1 *de* N2 construction.

The following qualitative corpus analysis aims to apply the concept of construction morphology presented by Booij (2010; 2015) and exemplified by Masini (2009) to a cross-linguistic comparative analysis of large-scale corpus data for Spanish, French, and Portuguese N Prep N constructions. The focus of the analysis is on constituent variation of the internal prepositional element in N Prep N constructions in these three languages, and I will apply Masini's inheritance hierarchy template (Figure 1) to the internal variability of prepositional constituents. It is useful to include a further intermediate level prior to the first and second levels of the hierarchy for N Prep N templates mentioned above. This additional level contains the abstract template with the semantic function of N2, which in the following corpus analysis is shown to be a crucial factor in determining the possibility of internal prepositional variation. For the purposes of the present analysis, the inheritance hierarchy for N Prep N templates may be visualized as in Figure 2.

This figure shows the inheritance hierarchy adapted from Figure 1 by means of example (3a). As mentioned above, the added abstract intermediate levels are intended to reflect the possibility of that prepositional variability for certain N Prep N constructions and the dependence of this variability on the semantic function of the nominal constituents of the construction. The objectives of the following qualitative corpus analysis are to apply the inheritance hierarchy in Figure 2 to a large-scale corpus of natural speech data for Spanish, French and Portuguese and to compare the internal prepositional variability of N Prep N constructions in these three languages.

### 5 Internal constituent variability and semantic transparency

Figure 2: Inheritance hierarchy for internal variation in N Prep N templates in French (adapted from Masini 2009)

## **5 Qualitative corpus analysis**

As mentioned above, the present corpus analysis is intended to investigate internal constituent alternation of the prepositional element in N Prep N constructions in Spanish, French and Portuguese. The focus is on the alternation between *de* and *à/a*, *de* and *en/em*, and *de* and *pour/para*. This study builds on a quantitative corpus survey on the internal alternation of the prepositional element in N Prep N constructions in Spanish, French, and Portuguese by means of largescale corpus data (Hennecke & Baayen 2017). Hennecke & Baayen (2017: 144) showed that internal prepositional variation in the three languages under investigation is possible, but that these languages show different characteristics in terms of frequency and productivity of such alternations. The quantitative analysis of the three languages focused on frequency of types and token, productivity (i.e. probability of previously unobserved types), and population size (i.e. potential number of formations) (Hennecke & Baayen 2017: 139). The results show that Portuguese and, to a lesser extent, French, allows productive internal constituent variation of the prepositional element. In contrast, Spanish does not show productivity in internal variation, which is demonstrated by the absence of hapax legomena (ibid.). At the same time, Spanish has the greatest tendency to employ the preposition *de* in N Prep N constructions. In French, the prepositions *à* and *pour* are slightly more productive than in the other two languages. Moreover, French tends to avoid constructions using *avec*, whereas constructions with *com* are productive in Portuguese. The latter tendency may be explained by the fact that French prefers NA-constructions over constructions of the type N *avec* N.

The aim of the present corpus analysis is to investigate the results from the above-mentioned study from a qualitative perspective. In this qualitative survey, the internal prepositional variability will be investigated from a mostly semantic perspective, combined with a constructionist approach. Here, the focus will be on

### Inga Hennecke

which nominal semantic functions allow prepositional variability and whether the variability depends on the semantic transparency of the construction. To that end, this corpus analysis is based on the same dataset as in Hennecke & Baayen (2017), namely three web corpora from the TenTen corpus family from Sketchengine: the French corpus frTenTen12, the Spanish corpus esTenTen11 and the Portuguese corpus ptTenTen11. The TenTen corpora are large-scale web corpora with the counts displayed in Table 2.

Table 2: Corpus Information of the TenTen corpora for Spanish, French and Portuguese (https://the.sketchengine.co.uk)


In order to perform a qualitative analysis of the data, all N Prep N constructions were extracted automatically from the corpora, keeping only those that appear with more than one internal prepositional element. The present analysis focuses exclusively on N Prep N constructions and therefore excludes constructions of the type N Prep Det N. The data were manually inspected by excluding grammaticalized constructions (for example Fr. *face à N*, Sp. *gracias a N* 'thanks to N'), binominal pairs (e.g. Fr. *temps en temps* 'time to time', Sp. *dia a dia* 'day to day'), and antonyms (Fr. *chien avec/sans laisse* 'dog with/without leash'). Table 3 demonstrates the underlying dataset for the qualitative analysis.

Table 3: Type and token counts for the underlying dataset with all pairs of nouns that are attested with at least two different internal prepositions


This dataset shows important differences in type-token frequency between the three languages (Hennecke & Baayen 2017). Portuguese presents by far the

### 5 Internal constituent variability and semantic transparency

greatest number of different types and tokens of N Prep N constructions with more than one internal preposition. In contrast, Spanish has very few different types but a considerable number of tokens. This can be interpreted as a small number of different N Prep N constructions, but these few types appear quite often in the corpus data. The French data show a significantly higher number of different types than the Spanish data, but a lower number of different tokens. Here, more different types occur less often in the corpus data (for a detailed quantitative analysis of the data see Hennecke & Baayen 2017). In what follows, a qualitative analysis of selected pairs of internal prepositions is presented in order to investigate whether these differences also appear at a qualitative level, with a special focus on the semantic functions of the nominal constituents and the semantic transparency of the constructions. The specific semantic relations were established with regard to the current literature on the semantic relations of nominal constituents in nominal compounds (Gagné & Shoben 1997; Gagné & Spalding 2009; Girju et al. 2005). They were subsequently modified and adapted to the specific case of N Prep N constructions in the corpus data under investigation. It is not possible to list and discuss all occurrences of all types in the present paper; only selected examples will therefore be discussed and analyzed. Where necessary, references will be made to frequency of occurrence.

## **5.1 The preposition** *de* **in N Prep N constructions**

In all three languages under investigation, the preposition *de* is most often used to combine two nominal expressions, as in Fr. *salle de bain* ('bathroom'), Sp. *botas de agua* ('rubber boots'), or Pt. *moinho de vento* ('wind mill'). Therefore, the preposition *de* appears in all pairs of internal prepositional variation analyzed in the following section. The three data sets also show internal variation for prepositions other than *de*, but these pairs are not the subject of the present analysis. As mentioned above, the preposition *de* has been much discussed; it has often been considered an "empty" or "colorless" preposition that lacks any kind of semantic content and that merely fulfills a linker role functions. This completely functional approach is not adopted in the present paper for reasons given above. In the present account, I follow a constructionist approach (see Masini 2009), in which the prepositional constituent in N Prep N constructions is an element of semantic consequence to the whole construction (Masini 2009: 262). Masini states the example of Italian N1 *di* N2 intermediate lexical constructions, which clearly differ semantically from N1 *a* N2 intermediate lexical constructions. With reference to Johnston and Busta (1996), she emphasizes that "the prepositions *da*,

### Inga Hennecke

*di* and *a* in Italian N+PREP+N expressions, under certain conditions and in combination with certain classes of nouns, are specialized for different kinds of modification" (Masini 2009: 262). In the further analysis, the statement from Masini will be refined, since in the present data, the intermediate lexical constructions N1 *de* N2 and other intermediate lexical constructions (e.g. *N1 para/pour* N2) may overlap semantically under certain conditions. These cases will be exemplified below in a cross-linguistic comparative analysis. In this analysis, the preposition is seen not as a semantically opaque constituent but as a constituent with a specific semantic value determined by the semantic functions of the nominal constituents.

The preposition *de* in French, Spanish, and Portuguese has been described as expressing various relations (Bartning 1993: 187). In binominal constructions, it expresses, for instance, a relation of possession (Sp. *el ordenador de Luis* 'Luis' computer', Fr. *la voiture de Jean* 'John's car'), characterization (Fr. *statut de valeur* 'status'), instrument (Fr. *coup de baton* 'blow'), material (Fr. *papier de soie* 'silk paper'), a part-whole relation (Sp. *puerta de casa* 'front door', Pt. *ponta do dedo* 'fingertip'), an affiliation (Fr. *fils de roi* 'king's son'), a content (Fr. *tasse de café* 'cup of coffee'), a defining characteristic (Sp. *hotel de lujo* 'luxury hotel') or a purpose (Pt. *vestido de noiva* 'wedding dress'). For more examples in French see Lang (1991: 291ff.).

## **5.2 Internal variation between** *de* **and** *a/à*

Internal variation between the prepositional constituents *de* and *à* has been the subject of several articles and books on French prepositions and nominal syntagms (e.g. Anscombre 1990; Lang 1991; Bosredon & Tamba 1991; Cadiot 1997). However, it is interesting that this discussion has no equivalent in the literature on Spanish and Portuguese prepositions. This is because such internal variation does not take place in Spanish and only to a small extent in Portuguese. The sole example of internal variation of *de* and *a* in the Spanish corpus data is the following:

(5) [N1 *de/a* N2type/specification]N> *freno de/a disco*, 'disk brake'

Here, the construction containing *de* is far more frequent and the lexicalized form can be found in dictionaries. Still, the construction *freno a disco* also occurs regularly in the corpus data of the esTenTen corpus, with a frequency of 0.10 occurrences per million. However, the corpus data shows that the internal variation of *de* and *a* is neither frequent nor productive in Spanish, as only one example of

one type can be found in this large-scale internet corpus. In Portuguese, the pt-TenTen data shows at least two Intermediate lexical constructions with variation of *de* and *a* that present a certain productivity:


The template [N1 *de/a*N2type/specification]N, in particular, is frequently present in the corpus data and is expressed via different types, as in *motor de/a combustão* ('combustion motor') or *bomba de/a vácuo* ('vacuum pump'). It is striking that many of these types are technical terms. It is possible to perceive a semantic difference in both intermediate constructions, where the type N1 *a* N2 more clearly indicates the material part of the N2 constituent, the type N1 *de* N2 focuses semantically on complementing N1 and creating a construction that is a subtype of N1. However, the first sample surveys and questionnaires revealed that native speakers of European and Brazilian Portuguese do not perceive a difference in the semantic meaning patterns or, more precisely, in the semantics of the whole construction.

For the French data, a very different pattern appears in the analysis of internal variability of *de* and *à*:


[N1 *de/à* N2type/specification]<sup>N</sup> *course à/d'obstacles* 'obstacle course'

[N1 *de/à* N2container]<sup>N</sup> *conteneur de/à déchets* 'waste bin/bin with waste'

(9) [N1 *de/à* N2transport]<sup>N</sup> *course de/à vélo* 'biking trip'

The existing literature on *de-à* alternation in French emphasizes that there is a semantic difference between binominal constructions containing *de* and *à* and that this semantic difference affects not only the prepositional element itself but also the whole naming unit. This becomes very clear in more detailed analysis

### Inga Hennecke

of the examples from the template [N1 *de/à* N2container]N. In all these examples, the intermediate lexical construction N1 *à* N2 designates the container itself, as in *flûte à champagne* ('champagne glass') or *corbeille à fruit* ('fruit bowl'). In contrast, the intermediate lexical construction N1 *de* N2 denotes the content of the container, as in *flûte de champagne* ('a glass of champagne') or*corbeille de fruit* ('a bowl of fruits'). In these cases, according to Cadiot (1997), *de* turns the interpretation of the construction toward the N2 and constructs a quantified image of the referent, whereas *à* turns the interpretation toward the N1 and permits a qualified image of the reference (Cadiot 1997: 44). That is, *de* carries an effect of quantification whereas *à* carries a semantic notion of qualification. For cases of the intermediate lexical construction [N1 *de/à* N2ingredient]N, such as *salade d'écrevisses* and *salade aux écrevisses* ('crawfish salad'), Lang (1991) states that the preposition *à* connects N1 and N2, whereas the preposition *de* derives N1 from N2. That is to say that *à* describes an ingredient, whereas *de* describes a substance (Lang 1991: 283). In the same way, in the examples of [N1 *de/à* N2type/specification]<sup>N</sup> and [N1 *de/à* N2means of transport]N, it can be seen that *à* points to the material object *vélo* or *obstacles*, whereas *de* more likely complements the N1, and hence the whole construction describes a subtype of N1. According to Cadiot (1997: 43), the semantic differences that occur through the variation of the prepositions *de* and *à* can be accounted for in terms of the more abstract categorization that is the opposition of intension and extension. On this view, *de* constructs an extensional reference directly, whereas *à* creates an extensional reference indirectly by passing over an intentional reference (Cadiot 1997: 62).

From a constructionist perspective, it can be stated that only in the template [N1 *de/à* N2container]<sup>N</sup> does the semantic value of the whole construction change, as in *conteneur de déchets* ('bin containing waste') and *conteneur à déchets* ('waste bin'). In this case only, we have two different naming units when *de* and *à* alternate. Therefore, only here is it appropriate to refer to two different constructions, [N1 *de* N2container]<sup>N</sup> and [N1 *à* N2container]N, which lead to two different naming units at the lexical level. In all the other cases mentioned above, the variability of *de* and *à* does not lead to different semantic interpretations of the lexical outcome, but only to a difference in the semantic weight of certain meaning patterns in the interpretation. Therefore, in all other cases, the inheritance hierarchy from the previous section of this paper can be applied in order to capture the internal constituent variation.

To conclude this analysis, it can be stated that all constructions that allow internal constituent variation of the prepositional element are semantically transparent. The analysis shows that alternation of the internal prepositional constituent does not go along with the semantically more opaque constructions in

### 5 Internal constituent variability and semantic transparency

the languages under investigation, since normally in these cases, the semantic functions of the nominal constituents cannot always be clearly determined. In Spanish and Portuguese, the internal variation is only possible in very specific cases of semantic function of the nominal constituents. In French, on the other hand, the internal variation of *de* and *à* is more frequently used or observed, and appears to be governed by the semantic functions of the nominal constituents.

## **5.3 Internal variation between** *de* **and** *em/en*

The variation between *de* and *en/em* in N Prep N constructions has received little attention in the literature. On a general level, Lang (1991: 411) states that in French, *en* between two nouns indicate the location of N1, as in *arc-en-ciel* ('rainbow') and *une ville en Italie* ('a city in Italy'), the characterization of N1, as in *ange en stuc* ('stucco angel'), a way of preparation of N1, as in *une salade en vinaigrette* ('a salad with dressing'), the material of N1, as in *robe en soie* ('silk dress'), the form in which N1 appears, as in *fleurs en bouquet* ('bouquet of flowers'), the condition in which N1 stands, as in *arbre en fleur* ('blooming tree'), or a field in which N1 operates, as in *expert en assurances* ('insurance expert'). According to Laumann (1998: 55), French *de* and *en* are not always interchangeable when N2 refers to the material of N1. On the basis of an analysis of French grammar and dictionary entries, Laumann states that *en* appears more regularly with a predicative supplement than *de*, gives more concrete information about the material, and is less strongly linked to the N1. However, the most important difference seems to be that *en* cannot appear in more opaque constructions with a (partially) idiomatic reading. Laumann (1998: 55) cites the examples of *homme de fer* ('iron man') and *yeux d'acier* ('steely eyes'), where it is not possible to substitute *en* for *de*. In the French data, most of these relations can also be seen in variations of *de* and *en*, as in the following examples:


### Inga Hennecke

In each of these cases, the variation between *de* and *en* does not trigger any strong meaning difference between the two construction types; that is, it is possible to talk about internal variability rather than about two different types of constructions referring to different naming units. Nonetheless, certain differences are visible, as Laumann (1998) pointed out. For instance, *en* is generally less closely linked to N1 and more often introduces a complement. Constructions with *en* also appear to have a lesser degree of fixedness and put the focus on the N2. The present analysis confirms Laumann's observation that the alternation of *de* and *en* is only possible in semantically transparent constructions that do not include any idiomatic meaning.

A very similar picture emerges from the analysis of the Portuguese data, as shown in the following examples:


The Portuguese data show almost the same intermediate lexical constructions that function with a variation between *de* and *em*. The only difference is in the intermediate construction [N1 *de/em* N2medium]N, where N2 designates the medium via which N1 is transferred. In contrast, the French data offer the intermediate construction [N1action *de/en* N2material]N, which indicates a concrete action referring to a specific (raw) material. Nevertheless, from a quantitative perspective, the internal variation between *de* and *em/en* is by far more frequent in the Portuguese data.

From a quantitative perspective, the variation between *de* and *en* is quite rare in the Spanish data, but the qualitative analysis shows a more diverse picture:


5 Internal constituent variability and semantic transparency

(17) [N1 *de/em* N2location]<sup>N</sup> *ciclismo de/en pista* 'track cycling'

[N1 *de/en* N2condition]<sup>N</sup> *obras de/en construcción* 'construction site'

(18) [N1 *de/en* N2medium]<sup>N</sup> *entrevista de/en radio* 'radio interview'

The examples show that Spanish allows the same internal variation as Portuguese, except that the template [N1 *de/en* N2group]<sup>N</sup> was not present in the data. In Portuguese and French, there are no strong meaning differences between the two templates, and therefore they can be counted as variants rather than as two distinct forms. In Spanish, as in French and Portuguese, the same subtle differences in the degree of fixedness and focus of the constituents can be observed.

Overall, it is possible to state that the variation between *de* and *en/em* is possible in all three languages under investigation. The differences appear to exist at the quantitative level rather than in the specific semantic meaning patterns. In all three languages, different templates can demonstrate and explain the possible alternation between *de* and *en/em*. For most cases, these templates overlap in the three languages. Therefore, it is possible to apply the inheritance hierarchy mentioned in Section 4 to all of the examples.

## **5.4 Internal variation between** *de* **and** *pour/para*

For French binominal compounds of the type N Prep N, Laumann (1998) states that the preposition *pour* occurs quite rarely. This may be explained by the fact that *pour* is less abstract than other prepositions, such as *de* or *à*: that is, *pour* indicates a very concrete meaning of purpose or determination, whereas *de* shows a less definite meaning pattern. Therefore, *de*, as a semantically more opaque constituent, offers a wider scope for application than *pour*, but in some cases both prepositions are interchangeable, as in the following examples:


(20) [N1 *de/pour* N2user(object)]<sup>N</sup> *musique de/pour piano* 'piano music/music for piano'

### Inga Hennecke

The French data show that the variation between *de* and *pour* only is possible in cases where N2 designates a user (or a beneficiary), or where N2 specifies the purpose of N1. In all three templates given above, N2 serves to form a subtype of N1. However, the templates containing the preposition *de* point more clearly to the N1 and focus on the interpretation of the whole template as a subtype of N1. In templates containing the preposition *pour*, the preposition is clearly attached to the N2, and the semantic emphasis is on N2. Furthermore, the preposition *pour* clearly carries the interpretation 'for', whereas the constructions containing *de* leave room for ambiguous interpretation. While *musique pour piano* clearly designates music (a piece of music or composition) for piano, *musique de piano* may also refer to music played by a piano (and not necessarily composed for playing on a piano). In this sense, *pour* helps to resolve ambiguity and allows only the interpretation 'designed for'. For the Spanish data, the pattern is quite similar to the French data, as in the following examples:

(21) [N1 *de/para* N2user]<sup>N</sup> *club/ropa de/para niños* 'children's club/clothes'

[N1 *de/para* N2purpose]<sup>N</sup> *alimentos de/para consumo* 'consumer goods'

(22) [N1 *de/para* N2user(object)]<sup>N</sup> *juego de/para pc* 'PC game'

These cases show that the variation between *de* and *para* is possible only in contexts in which N2 semantically represents a user of (a person or an object) or a specific purpose for N1. These are the same templates that were found for the French data above. This result contradicts the findings from López (1970), who indicates that variation of *de* and *para* is also possible in contexts in which N1 designates a container, as in *cesto de/para basura* ('waste bin/bin for waste'). In her corpus data of Argentinian Spanish from Buenos Aires, Pacagnini (2003: 164) also finds constructions of the type *loción de/para limpieza* ('cleaning lotion') or *crema de/para hidración* ('hydration crème'), in which the preposition expresses the utility of an object. Furthermore, she describes examples of the type *lápiz de/para labios* ('lipstick') and *esmalte de/para uñas* ('nail polish'), in which N1 represents an instrument. From this, Pacagnini deduces a schema in which, on a continuum between morphology and syntax, *de* lies closer to the morphological pole, whereas *para* is closer to the syntactic pole. In this paper, I can confirm Pacagnini's hypothesis that N Prep N constructions in Spanish show a certain internal variation in respect of the prepositions *de* and *para*, which might therefore

### 5 Internal constituent variability and semantic transparency

be considered as lying at different points of a continuum between the morphological and the syntactic pole. In this case, it is evident that constructions with *para* are located closer to the syntactic pole than constructions with *de*. Pacagnini observes that 75 percent of the participants in her data used a determiner or a qualifying adjective with the preposition *para* in cases where N1 denotes an instrument, as in *loción de/para la limpieza* ('lotion for cleaning') or *esmalte para uñas sensibles* ('polish for sensitive nails') (Pacagnini 2003: 166). In the esTenTen corpus data, this type of variation between *de* and *para* does not occur at all. However, a closer look at the Portuguese data offers interesting findings:


'animals for slaughter'

The Portuguese data illustrate that variation of *de* and *para* is possible in a larger number of nominal semantic relations in Portuguese than in Spanish or French. On the one hand, Portuguese offers the same templates as French and Spanish: N2 as a user (object or person) and N2 as a specific purpose of N1. Portuguese also provides additional templates, including N2 as a specific time or period of time, and N2 designating a specific determination for N1 (which in most cases is a living being). One additional template, N1 being an instrument for N2, is of particular interest. Here, we find the Portuguese example *produto de/para limpeza* ('cleaning product'), which Pacagnini cited for Argentinian Spanish. This template appears to be productive in Portuguese, as shown in the additional examples *creme de/para mãos* ('hand cream') and *máscara de/para cílios*

### Inga Hennecke

('mascara for eyelashes'). Although our Spanish data contradict Pacagnini's findings for Spanish, the same template can be found in Portuguese. Further investigation of this phenomenon is necessary, particularly in light of the possibility that the Spanish used in Buenos Aires, where Pacagnini collected her data, may be influenced by Portuguese from Brazil. Initial informal speaker assessments of native Spanish speakers in Spain reveal that the template [N1instrument *de/para* N2]<sup>N</sup> is not productive in Spain and that the template [N1instrument *para* N2]<sup>N</sup> is considered incorrect.

The analysis of the variation between *de* and *para*, and *de* and *pour*, in Spanish, French, and Portuguese reveals that Portuguese has the largest number of templates at an intermediate lexical level for the variation of *de* and *pour/para*. The Spanish and French data overlap in their templates for the variation of *de* and *pour/para*, while the Spanish data from Buenos Aires (Pacagnini 2003) offer a slightly different picture. The analysis here supports the findings from the previous subsections on the semantic transparency of the constructions under investigation. The present analysis does not feature any (partially) opaque or (partially) idiomatic constructions. I mentioned at the beginning of this subsection that the prepositions *de* and *pour/para* vary in their semantic transparency; nevertheless, they undergo internal constituent variation in all three languages under investigation. While traditional accounts generally mention the different syntactic status of constructions containing *de* and *pour/para*, the constructionist approach introduced in Section 4 makes possible an unproblematic mapping of this internal constituent variation.

## **6 Conclusion**

The present study of internal constituent variation in N Prep N constructions allows numerous conclusions to be drawn as to their nature in Romance languages as well as on the role and variability of the prepositional element. The discussion and analysis here have shown that it is not always possible or expedient to differentiate clearly between lexical and syntactic N Prep N constructions. In many cases, not even the numerous delimitation tests may lead to a clear distinction. Therefore, the present account has abandoned this strict, dichotomous distinction in favor of a more holistic approach. When considering internal constituent variability, the determining factor is not the lexical or syntactic status of the elements; instead, it is the nominal semantic relation expressed via the preposition. Here, it is not crucial to differentiate between the lexical status, e.g. *libro de niños* ('children's book') and the syntactic status, e.g. *libro para niños*

### 5 Internal constituent variability and semantic transparency

('children's book'). In order to conduct a fruitful qualitative comparative analysis of N Prep N constructions in Romance languages, it is necessary to adopt a theoretical account that does not focus on the lexicon-syntax distinction. In the present paper, construction morphology, a constructionist approach that expands the notion of construction to the word level, offers the appropriate tools for analysis. Following Masini (2009: 261), N Prep N constructions are analyzed as abstract templates, which are, to some degree, productive and associated with a naming function. For the present analysis, a constructionist inheritance hierarchy has been adapted to internal constituent variation in one construction (see Section 4). The latter analysis focused on the intermediate lexical level, that is, the alternation between [N1 Prep1 N2] and [N1 Prep2 N2], at which Prep1 and Prep2 designate alternative prepositions. This constructionist approach revealed the possible templates for prepositional variation in three different languages: Spanish, French, and Portuguese.

The analysis of three alternating pairs, specifically *de* and *à/a*, *de* and *en/em*, and *de* and *pour/para*, demonstrates important differences and common features between the languages. The quantitative aspect, which was not the primary focus of this paper, demonstrates the strong frequency and productivity of the different templates in Portuguese. This holds to a lesser extent in French and is even less in Spanish. This result is in line with the results from Hennecke & Baayen (2017). The qualitative analysis demonstrates that, in the underlying datasets, Portuguese offers the greatest number of different templates for internal prepositional variation, followed by French, and then Spanish. In this connection, it should be mentioned that Portuguese also offers the largest number of constructions (or types) for each template. This result confirms the impression from the quantitative study that Portuguese N Prep N templates are frequent in speech and are very productive. From a qualitative perspective, it is striking that most templates of internal prepositional variation exist across languages. In the case of the pair *de* and *en/em*, the templates that allow internal prepositional variation vary only slightly between the languages. For variation between *de* and *à/a*, the French data show the greatest tendency to internal variation. This is mainly because the preposition *à* is relatively productive and frequent in French, which is not the case for Spanish and Portuguese. In cases where French relies on the preposition *à*, Spanish and Portuguese mostly employ the preposition *de*, as in Fr. *verre à vin*, Sp.*copa de vino* and Pt. *copo de vinho* ('wine glass', in each case). Spanish does not offer any internal variation of *de* and *a*, whereas Portuguese shows certain tendencies in this direction. For the variation between *de* and *pour/para*, the French and Spanish data do not show any qualitative differences; that is, they overlap exactly in terms of which templates allow internal prepositional variation. Studies based on data from Argentinian Spanish indicated the existence of further templates; these were not found in the present data in Spanish, but many of them were present in the Portuguese data.

A very important finding from the qualitative analysis is that internal prepositional variation in the three languages is possible only for semantically transparent constructions. This can be explained by the fact that in opaque N Prep N constructions, the semantic relation between the nominal constituents often cannot be determined explicitly.

In conclusion, a constructionist approach to N Prep N constructions may solve certain problems in defining and delimitating these constructions in Romance languages. Furthermore, a constructionist approach allows an accurate investigation of the differences and common features of templates for internal prepositional variation in the three languages under investigation here. Future studies should investigate these templates in more detail, extending the approach to other types of internal variation.

## **References**


## **Chapter 6**

## **Production of multiword referential phrases: Inclusion of over-specifying information and a preference for modifier-noun phrases**

Christina L. Gagné Department of Psychology, University of Alberta

Thomas L. Spalding Department of Psychology, University of Alberta

J. Claire Burry Department of Psychology, University of Alberta

Jessica Tellis Adams KidsAbility Centre for Child Development, Ontario

> We examined the underlying psycholinguistic and cognitive factors that give rise to the production of multiword expressions. For example, if a story describes a woman buying a dog with blue fur, will people include the color of the dog when referring to the animal and, if so, in what syntactic form? In the experiment, participants read short stories that contained a concept that was presented as either a modifier-noun phrase (e.g., *the blue dog*) or full phrase (e.g., *the dog that was blue*). We also varied whether the property being highlighted was normal (e.g., *brown*) or distinctive (e.g., *blue*) for the head noun concept (e.g., *dog*). We found that participants are more likely to include distinctive properties than normal properties when referring to the concept. Although the selection of a syntactic form was partially influenced by the form of the information in the story, there was a strong overall bias toward using a modifier-noun phrase structure.

Christina L. Gagné, Thomas L. Spalding, J. Claire Burry & Jessica Tellis Adams. 2020. Production of multiword referential phrases: Inclusion of over-specifying information and a preference for modifier-noun phrases. In Sabine Schulte im Walde & Eva Smolka (eds.), *The role of constituents in multiword expressions: An interdisciplinary, cross-lingual perspective*, 155–178. Berlin: Language Science Press. DOI:10.5281/zenodo.3598564

Christina L. Gagné, Thomas L. Spalding, J. Claire Burry & Jessica Tellis Adams

## **1 Introduction**

## **1.1 Aim and background**

The aim of this chapter is to explore when and how multiword expressions are used within a referential context. In particular, we focus on production of referential expressions and examine what drives the inclusion of modifying information and the syntactic form of the expression. When referring to an object, person, or event, a speaker/writer is faced with the challenge of assigning linguistic labels to conceptual entities; often, several linguistic expressions can be used. For example, the same object can be referred to as *cup*, *ceramic cup*, or *cup that is made of ceramic*. What influences this decision? Two aspects of forming a referential expression are particularly relevant and will be the focus of our investigation. First, the speaker/writer might or might not include modifying information in the referential expression. Second, if modifying information is included, the expression might be a compound (e.g., *ceramic cup*) or a full noun phrase (e.g., *cup that is made of ceramic*). Although it is tempting to think of these as two separate ordered decisions (first decide whether or not to modify, then decide the form of the modification) we should note that these two aspects are not necessarily deliberate, conscious choices, nor need they be, strictly speaking, independent or sequential. Rather, the ultimate form of the expression may reflect underlying cognitive processes carried out within the language system that, working together, give rise to the form of the expression, and hence to both the syntax and the presence (or not) of modifying information.

Much of the existing work on compounds and modifier-noun phrases has focused on compound access and interpretation. The current study takes a different approach to this problem. Rather than focusing on the interpretation per se, we examine production to identify some of the expectations and biases that human users have about the use of modifying information during referential communication. When using referential expressions, speakers/writers attempt to establish both semantic co-ordination and lexical co-ordination with the addressee (e.g., Clark & Wilkes-Gibbs 1986; Garrod & Anderson 1987; Clark & Schaefer 1989). An attempt is made to synchronize the underlying mental model of the current situation as well as the specific expressions that are applied to particular entities within that model. In doing so, the speaker/writer draws on many different types of knowledge, including world knowledge, knowledge about information expressed in the conversation/discourse, and knowledge about linguistic conventions. Identifying the expectations that people have about the use of multiword expressions provides insight into how people are conceptualizing both the entities denoted by these constructions and the scenarios in which the constructions

### 6 Production of multiword referential phrases

are or should be used. Consequently, this area of research has implications for a variety of areas within the psycholinguistic and linguistic literature. In particular, the current project contributes to research that examines the contribution of the individual constituents to the understanding of the meaning of the whole expression, and the appropriateness of the use of the whole construction in a given situation.

The semantic transparency of the constituents of a compound has been a widely studied aspect of compound processing (Libben 1998; Jarema et al. 1999; Gagné & Spalding 2016; Smolka & Libben 2017). In general, compounds with opaque constituents (e.g., *humbug*) are more difficult to process than compounds with transparent constituents (e.g., *schoolyard*). Of course, in creating a multiword referential phrase that is new (as opposed to a known compound word, for example), the constituents will need to be relatively transparent in order to provide the information that would allow the communicative task to be successfully completed. However, at any given level of transparency there are other aspects that will influence whether a head noun is modified. In the current chapter, we will consider one of these factors, namely the distinctiveness of the property denoted by the modifier, which, like semantic transparency, is a semantic factor. Both *blue dog* and *brown dog* are semantically transparent expressions in that the meaning of the constituents contribute to the meaning of the whole. However, *blue dogs* are more distinctive compared to the concept *dog* than are *brown dogs*. We explore whether people are sensitive to the distinctiveness of a property during the formation of multiword expressions.

## **1.2 Overview of the chapter**

In this chapter, we begin by providing an overview of the theoretical issues concerning the inclusion of modifying information and the use of either full phrases or modifier-noun phrases. Next, we present an experiment in which we manipulated two factors that might influence the production of referential expressions. In particular, we examined whether the distinctiveness of the modifying information influences whether that information is used when referring to the antecedent. In addition, we examined whether the syntactic form in which the modifying information is presented influences the form in which modifying information is conveyed. Finally, we discuss the relevance of the empirical data within a psycholinguistic context and highlight the implications of the data for multiword expressions and for modifier-noun phrases in particular.

Christina L. Gagné, Thomas L. Spalding, J. Claire Burry & Jessica Tellis Adams

## **1.3 What motivates the inclusion of modifying information?**

The expressions used to denote referents reflect how the speaker/writer is conceptualizing the object and, in particular, how he/she chooses to distinguish it from other items (Brown 1958; Olson 1970). Indeed, speakers are sensitive to both nonlinguistic- and linguistic-ambiguity during referential communication and attempt to avoid producing ambiguous expressions (Ferreira et al. 2005). A key issue for the current research concerns the factors that lead people to include modifying information rather than using an unmodified noun when producing a referential expression. The inclusion of modifying information serves several linguistic and psychological functions. Most often, modifying information is used to distinguish among potential referents (Downing 1977; Brekle 1986). There are often situations in which using the category label alone would not be sufficient. Consider a situation in which there are several cups on a table. To refer to a particular cup, for example, a speaker might specify its material and use either a full noun phrase (e.g., *May I have the cup that is ceramic?*) or a compound (e.g., *May I have the ceramic cup?*). Both utterances involve combining information about the head noun concept (e.g., *cup*) with information about a modifying concept (e.g.,*ceramic*). This combination of information, in turn, allows the unambiguous identification of the referent within the available set of potential referents.

Several experiments on referential communication that used a visual display of objects (Tanenhaus et al. 1995, see also Frank & Goodman 2012) have found that speakers use a pre-nominal adjective (e.g., *tall glass*) in a context in which there are contrasting members (e.g., *a short glass*), which is consistent with the hypothesis that speakers try to make their utterances as informative yet as economical as possible (Grice 1975). The pre-nominal adjective is used to uniquely identify one object among several objects. However, the motivation for using modifying information appears to go beyond merely disambiguating among multiple possible referents because it is often included even when there is no need to provide additional information. This phenomenon of providing modifying information even in cases where such information is not needed to identify the referent is known as over-specification. Indeed, there are a number of studies showing that participants include adjectives during referential communication even though this additional specification is not required to identify the referent (e.g. Pechmann 1989; Sedivy 2003; Maes et al. 2004; Koolen et al. 2013).

Over-specification performs various functions in addition to identifying referents. For example, modifying information (e.g., *the cup that is on the shelf near the plate*) is used to shift the addressee's focus of attention (Ariel 1990; Prince 1992; Gundel et al. 1993; Chafe 1994). Another reason for using modifying information is to conform to pre-established conversational pacts (Brennan & Clark

### 6 Production of multiword referential phrases

1996; Ibarra & Tanenhaus 2016). Conversational partners often converge on an expression and will persist in using that expression even when there is no longer a need to include the additional information. To use an example from Brennan & Clark (1996), the term *pennyloafer* was initially used to denote a particular shoe among other possible shoes. However, the speaker continued to use this term rather than switching to using the simpler term *shoe* even when no other shoes were present in the display.

From a cognitive processing perspective, over-specification appears beneficial to both the speaker and the listener. For example, it aids in the identification of objects in a visual array and, consequently, speakers are more likely to produce over-specified expressions when they were asked to imagine that the task was very important (i.e., when told to imagine that the control panel is being used for long-distance surgery) than when they were not given such a scenario (Arts et al. 2011). Over-specification also benefits production (Pechmann 1989). Consistent with this idea, redundant information is more likely to be included when the speaker is under time pressure. Koolen et al. (2016) conducted a study in which participants referred to target objects in a visual array of objects. Participants were more likely to use over-specifying information when they were under a time constraint (e.g., they had to respond within 1000 ms) than when they had an unlimited amount of time to refer to the target object. Koolen et al. (2016) concluded that when individuals are under pressure, they are more likely to use quick heuristics and therefore select properties of an object based on their perceptual salience rather than discriminatory power.

Overall, there appear to be many reasons for why speakers might choose to include modifying information in referential expressions. In the current experiment, we focus on additional usage of modifying information that has not been fully explored in the literature. In particular, we propose that modifying information might be used to mark a conceptual distinction among category members and, in particular, to make explicit note of particularly distinctive information.

Studies on referential expressions within a visual context (i.e., situations in which objects are presented visually) indicate that the distinctiveness of visual properties within the display influences referential expressions. Participants were more likely to provide modifying information (i.e., to produce over-specified expressions) when the property of an object is atypical (e.g., Westerbeek et al. 2015). For instance, Rubio-Fernández (2016) used a referential communication task in which participants asked the researcher to click on objects that were presented in an array on the computer screen. In the first experiment, participants saw pictures of paper dolls and a display of paper clothes that were either all the same color (e.g., *brown purse*, *brown shirt*, *brown dress*, and *brown shoes*) or different colors (e.g., *yellow purse*, *pink shoes*, *blue dress*, and *red pants*). In the second ex-

### Christina L. Gagné, Thomas L. Spalding, J. Claire Burry & Jessica Tellis Adams

periment, participants saw arrays of animals, fruits, vegetables, and artifacts that either had typical colors (e.g., *brown camel*) or atypical colors (e.g., *blue camel*). Participants tended to use a redundant color adjective in instances where such modifying information would be unnecessary (e.g., *the blue dress*, where only one dress could be a possible referent) more often when the object was an atypical color than when the object was a stereotypical color. These results suggest that modifying information is used when the concept has been modified with a distinctive property. Furthermore, participants provided modifying information more often when the color was a central property of the object category (e.g., referents such as clothing yielded a higher usage of redundant color adjectives than did geometrical figures). Taken together, these results suggest that a key characteristic in terms of determining whether modifying information is provided is conceptual distinctiveness rather than perceptual/visual distinctiveness. That is, the distinctiveness of the information relative to the category itself, rather than just within the visual display.

The aim of the current study is to explore the role of conceptual distinctiveness by examining whether the tendency to mention distinctive properties extends to situations in which the objects are not physically present. In particular, we will focus on a situation in which the contrast with other category members is implied or based on conceptual knowledge within a story context, rather than presenting the objects in a visual display. For example, mentioning that flowers are either fresh or wilted implicitly contrasts the flowers with ones that are not fresh or not wilted. Moreover, in the context of buying flowers as a gift, it is more typical to buy ones that are fresh than ones that are wilted. Thus, from a conceptual perspective, the property *wilted* is more distinctive for flowers than is *fresh*.

Conceptual distinctiveness is related to the issue of contrast. The notion of contrast between categories and subcategories has long played an important role in linguistic and psycholinguistic theories. Indeed, the principles of contrast and mutual exclusivity (Clark 1983; Carstairs-McCarthy 2010) are well-known constraints on word learning. In terms of multiword expressions, previous research on conceptual combination suggests that the notion of contrast influences how people use noun phrases. For example, Gagné & Murphy (1996) found that when verifying whether a property is true of a modifier-phrase (e.g., *submarine door*), people took less time to verify a property that was true of the phrase but not generally true of the head noun (e.g., *made of metal*) than to verify a property that was true of both the phrase and the head noun (e.g., *solid*). This finding suggests that people are sensitive to the extent to which the modified concept (e.g., *submarine door*) is semantically/conceptually distinctive from other members of the head noun concept (e.g., *door*).

### 6 Production of multiword referential phrases

In terms of judgments about whether a concept has a particular property, several studies (Connolly et al. 2007; Gagné & Spalding 2011; 2014b; Hampton et al. 2011; Jönsson & Hampton 2012) have shown that properties that are true of the head noun (e.g., *kites have strings*) are viewed as being less true of the modified head (e.g., *silk kites have strings*). This effect (known as the modification effect) appears to be driven by the expected level of contrast between the combined concept (e.g., *silk kites*) and the head concept (e.g., *kites*); when making judgments about the likelihood that a property is true, participants are influenced by the meta-knowledge that modified concepts are used to signal that the subcategory is similar to the category (e.g., *silk kites have many properties in common with kites*) but also that the subcategory is somehow different than the category (Gagné & Spalding 2011; 2014b; Spalding & Gagné 2015). These two expectations account for why properties that are true of the head noun are judged as being less true of the modified concept, and that properties that are false of the head noun (e.g., *candles have teeth*) are judged to be more true (but still unlikely) of the modified concept (e.g., *purple candles have teeth*). Indeed, the effects of the expected contrast is so strong that the same effects are seen even when the modifier is a non-word (e.g., Gagné & Spalding 2015).

Thus, we conclude that conceptual contrast or conceptual distinctiveness is a critical factor in the use and understanding of multiword phrases and compound words in general and is therefore likely to contribute to the production of such phrases.

## **1.4 When modifying information is included, how is it expressed?**

If modifying information is included, the syntactic form which expresses this information can still vary. In English, modifying information can be expressed as a full noun phrase (e.g., *a dog that is blue*) or as a modifier-noun phrase (e.g., *a blue dog*). Do people have a priori biases toward using one linguistic expression over another? The answer is not immediately obvious because intuitions based on ease of processing do not correspond with the tendency for expressions to become shortened over time.

In terms of ease of processing, there is an advantage to using a full phrase because noun compounds are particularly challenging to interpret (Lapata 2002; Copestake & Briscoe 2005; Libben 2014). Much of the difficulty lies in recovery of an implicit underlying relation between the modifier and head noun concept. A modifier-noun phrase is more ambiguous than a full phrase, in that the full noun phrase explicitly describes the exact nature of the modification that is being performed (e.g., *oil for babies*) whereas, for modifier-noun phrases (e.g.,

### Christina L. Gagné, Thomas L. Spalding, J. Claire Burry & Jessica Tellis Adams

*baby oil*) the nature of the modification is implicit and must be reconstructed by the listener/reader (see Levi 1978; Gagné & Shoben 1997). The term "modifiernoun phrase" most often refers to constructions that are novel (e.g., *apple juice seat*; *mountain magazine*), but, can also refer to lexicalized open (unspaced) compounds (e.g., *hunting dog*; *paper bag*). Indeed there seems to be commonalities in the processing of novel noun phrases and lexicalized compounds (Gagné & Spalding 2006). Psycholinguistic research has shown that human language users actively make use of relations during the processing of both novel and established/ lexicalized compounds (Gagné & Shoben 1997; Gagné 2002; Gagné & Spalding 2009; 2014a). This research indicates that, during the comprehension of noun compounds, the more available the required relation is, the easier it is to select the relation and, consequently, the less time it takes to interpret the compound. In other words, the more difficult it is to recover the implicit underlying relation, the more difficult it is to interpret a compound (see, for example, Gagné & Shoben 1997; Spalding & Gagné 2014; Schmidtke et al. 2018).

Given the difficulty inherent in recovering implicit semantic relations, one would presume that it would be advantageous to overtly express the relation and, consequently, to avoid the use of compounds. Yet, this is not what happens within the human language system. Over time, lexicalized phrases are often truncated and become compounds (e.g., *our lady's bug* became *ladybug*). Similarly, compounds can become non-compounds (e.g., *electronic mail* became *e-mail* and, more recently, *email*); the words *lord* and *lady* are derived from Old English compounds *half-weard* 'bread-keeper' and *halfdige* 'bread-kneader'. This truncation that occurs on a global (and more long-term) level within a language also occurs during local interactions. During referential communication, for example, linguistic expressions are often shortened (Garrod & Anderson 1987; Brennan & Clark 1996). For example, in one experiment, a geometric figure that was initially described as looking *… like a person who's ice skating, except they're sticking two arms out in front* became *the ice skater* (Clark & Wilkes-Gibbs 1986). Similarly, an object that was initially referred to as *the car that has like … blueprints painted on the side of it sorta* was later referred to as *the blueprint car* (Metzing & Brennan 2003). In sum, there appears to be a preference toward using syntactically simpler expressions such as compounds, even though such expressions are inherently more ambiguous than full expressions which specify the relation overtly.

On the basis of these findings, one would expect an overall bias towards using a truncated expression (e.g., using *wilted flowers* or *even flowers*, rather than *flowers that are wilted*). However, this bias must also be considered in light of another bias reported in the literature – namely, the tendency for people to reuse recently encountered syntactic structures. For example, Bock (1986) demon-

### 6 Production of multiword referential phrases

strated that speakers tend to re-use a syntactic structure from the priming sentence when describing a scene. This effect has been examined in a variety of context including examinations of whether it can be driven by a single word as in the case of featural accounts of syntactic priming. For example, Melinger & Dobel (2005) found that production preferences for dative alternation can be biased by prior exposure to a single verb. However, most relevant for the current project concerns studies that focus on the creation of referential expressions. Syntactic convergence occurs during referential communication. For example, participants were more likely to describe a picture of a red sheep as *The sheep that's red* when the confederate recently described a picture of a red door as *The door that's red* than when it was described as *a red door* (Cleland & Pickering 2003). This result suggests that participants tend to re-use syntactic structures, especially when the prime and target sentences share lexical items such as *red* (see also Chang et al. 2003). Similarly, Tarenskeen et al. (2015) found that when participants use modifying information to describe a target item from a visual array of six drawings of clothing, there is a tendency to continue to re-use the same syntactic structures.

These studies all demonstrate that participants have a tendency to re-apply the same syntactic structure that was used with one object/entity (e.g., *sheep*) when subsequently referring to a separate object/entity (e.g., *door*). However, an unresolved question concerns whether syntactic priming will occur in a task in which participants are introduced to a concept (e.g., *apples that are rotten*) and then are asked a question requiring them to refer to that same concept. This situation directly pits the bias towards truncation against the bias towards reusing syntactic expressions. The current experiment will investigate this issue.

## **2 Experiment**

## **2.1 Overview and rationale**

We examine the types of referring expressions that people produce when referring to a concept that has been encountered in a short description of a scenario. The experiment was designed to address two key issues:


### Christina L. Gagné, Thomas L. Spalding, J. Claire Burry & Jessica Tellis Adams

Participants read short stories and then answered a comprehension question that would require them to refer to something in the story. For example, one story described a woman buying a pet. The target antecedent was the dog that she purchased. We varied the type of modifying information that was presented with the target antecedent. The information was either normal or typical for the object or was distinctive. To illustrate, all participants read a version of the story in which the color of the dog was mentioned. For half of the participants, the dog was described as having brown fur (a normal feature for dogs), and for the other half, the dog was described as having blue fur (a distinctive feature for dogs). We were interested in what the participants would produce when they were asked *What kind of pet did Sally buy?*

We predict that distinctiveness will influence whether participants choose to include modifying information in their linguistic expression. Properties that are unusual or distinctive for the head noun will be seen as especially relevant and, consequently, will be more likely to be included in the description provided by the participants. However, properties that are not unusual will be deemed less relevant (because the majority of members of the head noun category have the same property) and therefore less likely to be included. Thus, when referring to a dog that was previously mentioned in a short story, participants will be more likely to include modifying information when the dog was described as having an atypical color such as *blue* relative to when the dog was described as having a typical color such as *brown*, because the resulting subcategory is more distinctive and therefore will tend to more readily identify the appropriate referent. In short, there are lots of brown dogs, but relatively few blue dogs in the world, and, consequently, it should be more informative to refer to the subcategory of *blue dogs* than to the subcategory of *brown dogs*. Note, however, that in no case is the modifying information required to uniquely identify the referent.

In terms of the syntactic form that is used to convey the modifying information, the existing literature points to two conflicting predictions. On one hand, people might show a tendency toward using a modifier-noun phrase even when the information is presented as a full noun phrase. Two considerations arise here. First, the modifier-noun phrase is shorter and syntactically simpler and, thus, might generally be preferred. Second, a modifier-noun phrase is more ambiguous than a full phrase, in that the full noun phrase explicitly describes the exact nature of the modification that is being performed (e.g., *a dog that is blue*) whereas, for modifier-noun phrases the nature of the modification is implicit and must be reconstructed by the listener/reader (Downing 1977; Levi 1978). Having the relation directly specified (e.g., *crayon that is made of plastic*, or *sunshine in the morning*) removes uncertainty about relation selection (Gagné & Spalding 2014a;

### 6 Production of multiword referential phrases

2015). Thus, there could be some trade-off, in which speakers or writers generally prefer to use the shorter modifier-noun phrase, as long as they have reason to believe that the recipient will understand the implied connection between the modifier and the head noun concepts. Gagné & Spalding (2004) found that the presence of a referent in a discourse made modifier-noun phrases easier to comprehend, even though the phrase itself had not been presented. In the present study, all of the stories include information (either in the form of the full noun phrase or the modifier-noun phrase) that should make it easy for a recipient to understand the modifier-noun phrase. Therefore, the participants, in responding to the question about the target antecedent, might show a general preference for the modifier-noun phrase.

On the other hand, the form in which the information was initially presented in the preceding discourse might influence the manner in which the information is later conveyed due to syntactic priming. That is, when information is presented as a modifier-noun phrase, then people should be more likely to produce a modifier-noun phrase than when the information is presented as a full noun phrase. This prediction is derived from research on the activation of syntactic structure during speech production that demonstrates that speakers tended to reuse a syntactic structure from the priming sentence when describing a scene (Bock 1986; Bock & Loebell 1990).

## **2.2 Method**

### **2.2.1 Participants**

Fifty-four introductory psychology students participated for partial course credit. All participants were native speakers of English. The data from two participants were not used because they did not follow instructions. Thus, data from 52 participants were included in the analyses.

### **2.2.2 Materials and procedure**

Twenty-eight short stories were constructed. Each story was under 65 words long and contained a target antecedent (i.e., the antecedent that we will be eliciting) for which we provided modifying information. We varied whether the modifying information was distinctive (e.g., *blue fur*) or usual (e.g., *brown fur*) for the head noun (e.g., *dog*) in the context of the story. In addition, we varied the syntactic form in which the modifying information was presented: the information was presented as a modifier-noun phrase (e.g., *brown dog; blue dog*) or full noun

phrase (e.g., *a dog that is brown; a dog that is blue*). These two variables were crossed which yielded four experimental conditions. For example, one story was:

Sally loves animals. She decided to get a pet. So she went to the pet store to see what was there. Sally immediately set her eyes on a [*blue dog/brown dog/dog that was blue/dog that was brown*]. She picked him up and knew instantly that he was going to be a great companion for her.

Only one of the expressions within the square brackets was presented to a particular participant. The items were counter-balanced such that each participant saw an equal number of stories in each of the four conditions and each item was seen only once by each participant. Order of presentation was randomized for each participant. The full list of target items (i.e., the unusual, normal, and head noun) is listed in the Appendix.

Participants viewed the stories one at a time on a computer screen. They were instructed to read each passage carefully and were allowed as much time as necessary to complete the task. After each story, participants answered two questions about the story. The first question required people to recall the referent of the target noun phrase from the story. It specifically required the participant to respond by describing the target concept. For example, a question might ask "What kind of pet did Sally get?" The participant typed in their answer. The second question was also associated with the passage, and asked about another aspect of the story.

## **2.3 Results**

Two of the authors classified the responses into four categories based on how the participants referred to the antecedent: modifier-noun phrase (e.g., *blue dog* or *brown dog*), full phrase (*dog that is blue* or *dog that is brown*), and head noun only (*dog*). In addition, a fourth category was used for "other" responses. Three main types of responses fell under this category. The first were responses that did not provide a specific answer (e.g., "I don't know", "it doesn't say"). The second were responses that did not address the question (e.g., "What does Nathan cut quickly" was intended to elicit either green or yellow grass, but the participant responded "because his parents are coming home"). The third type of response did not directly refer to the target reference (e.g., "What does Katie wear to keep her feet warm" was intended to elicit snake slippers or soft slippers but the participant responded "fuzzy slippers").

### 6 Production of multiword referential phrases

Inter-rater agreement was 100%. Table 1 displays the number of responses (for each condition) in each category. Overall, participants generally did include modifying information; modifying information was provided in 962 out of 1456 responses, and the vast majority (84%) of these responses were in the form of a modifier-noun phrase. The responses that were coded as "other" were not included in further analyses and, thus, the percentage with which a category was used within each of the four experimental conditions was calculated based only on responses in the form of a modifier-noun phrase, full phrase, and head noun only.

Table 1: Number of responses and row percentages (in parentheses) for each condition that were modifier-noun phrase, full phrase, head noun only, or other. Each row sums to 364.


We conducted two separate analyses. The first analysis focused on whether Form and Distinctiveness affected the likelihood of including modifying information. The second analysis examined whether Form and Distinctiveness influenced the form (e.g., full phrase vs. modifier- noun phrase) in which the modifying information was conveyed. In both analyses, the dependent variable was binary (i.e., is modified vs. not modified for the first analysis, and compound vs. phrase for the second analysis) and, consequently, we used the *melogit* function in Stata 15 to fit a mixed-effects model for binary responses. The experimental variables, Form and Distinctiveness, were included as fixed effects, and subjects and items were included as crossed random effects. The estimates of the fixed effects are reported as log odds.

To examine whether the syntactic form (e.g., *wilted flowers* vs. *flowers that are wilted*) in which the information had been presented in the story and the distinctiveness of the property influenced the likelihood of including modifying information when referring to the antecedent, we fit a model in which the dependent variable was whether the participant's response included modifying information; modifier-noun phrase and full phrase responses were coded as 1 and the

head noun only responses were coded as 0. Both the distinctiveness of the property and the form in which the information was presented in the story influenced whether modifying information was included in the response. Participants were more likely to provide modifying information when the property presented in the story was distinctive (e.g., *wilted* as a property of *flowers*) rather than usual (e.g., *fresh* as a property of *flowers*), 86% vs. 61%, = 1.48, SE = 0.24, = 6.22, < 0.0001, and when the property had been presented as a modifier-noun phrase rather than a full-phrase (81% vs. 67%), = −1.15, SE = 0.19, = −5.96, < 0.0001. The two predictor variables (Form and Distinctiveness) did not interact with each other, = 0.46, SE = 0.31, = 1.48, = 0.14.

The second analysis was conducted using only the responses that included modifying information (i.e., only the full phrase and modifier-noun responses) so that we could test whether the form in which the modifying information was presented in the story and the distinctiveness of the property influenced the way in which participants conveyed the modifying information in their response. The dependent variable corresponded to whether the response was a modifier-noun phrase (1 = modifier-noun phrase and 0 = full phrase). Participants were more likely to provide a modifier-noun response when the story used a modifier-noun form (predicted = 0.99, SE = 0.009) than when the story used a full phrase form ( = 0.64, SE = 0.05), 2 (1) = 31.47, < 0.0001. Note that because there are only two levels of the variable, the reverse is also true: namely, that participants are more likely to provide a full-phrase response when the story used a full-phrase form than when the story used a modifier-noun form. The type of property used in the story did not strongly influence whether participants used a modifier-noun form, 2 (1) = 3.05, < 0.08.

Distinctiveness and Form interacted, = −1.99, SE = 0.58, = −3.41, = 0.001, and, therefore, we examined the simple effects at each level of form. Distinctiveness of the property had no effect on whether the response was a full phrase or modifier-noun phrase when the modifying information was presented as a full phrase, 2 (1) = 2.43, < 0.12. However, when the modifying information was presented as a modifier-noun phrase, the response was more likely to be a modifier-noun phrase when the property was unusual/distinctive than when the property was normal, 2 (1) = 8.91, < 0.003.

## **3 Discussion**

We explored two aspects of the production of multiword referential expressions: inclusion of modifying information and syntactic form, with a particular focus on

modifier-noun phrases (e.g., *blue dog* and *brown dog*) and full noun phrases (e.g., *dog that is blue* and *dog that is brown*). The experiment directly pitted the bias towards truncation against the bias towards re-using syntactic expressions. The findings make three primary contributions to the literature on multiword expressions. First, we demonstrate the influence of semantic/conceptual knowledge on the inclusion of modifying information. In particular, the degree of conceptual contrast seems to be critical in determining whether modifying information is included when the referential expression is produced. Second, our results reveal the primacy of modifier-noun phrase constructions (over full phrase constructions) as a means of conveying that information. Third, while it is possible that there are small effects of syntactic repetition, or a general bias to use shorter syntactic forms for a reference to an already identified object from the story, the bias towards the modifier-noun phrase appears to be the main driver of the syntactic form of the referential expression, at least in this particular communicative task.

## **3.1 Including modifying information**

Previous research using visual displays of objects found that over-specification was more likely when a property was visually distinctive or salient such as when one object was a different color than other objects in the display (e.g., in a visual display in which one dog is blue and the others are orange). The current results extend this finding to a situation where the objects are not physically present and the distinctiveness of a property is based on conceptual knowledge about the modifier and head noun concepts. For example, *blue* is distinctive for *dogs* but not for *skies*. The knowledge needed to determine distinctiveness comes from past history and knowledge of the concepts involved rather than from visual information that is presented in the experiment. Therefore, our finding suggests that people are sensitive to conceptual distinctiveness in addition to (as shown in previous research) referential distinctiveness. To illustrate, in general language usage, a category name (e.g., *dog*) typically refers to a generic type (i.e., to the category of dogs). However, in our study, the referent was always a particular category member, not a generic category. Whether participants used a generic label or modified construction depended on the distinctiveness of the property (relative to the head noun category) used in the story. In this respect, our data highlights the role of a particular type of implicit information, namely knowledge about the nature of the category-subcategory similarity. In particular, the category label (i.e., *dog*) was used when the particular referent in the story was not unusual; that is, when the entity being described was similar to the generic

### Christina L. Gagné, Thomas L. Spalding, J. Claire Burry & Jessica Tellis Adams

representative of the category. Note that the modifying information was not required to uniquely identify the referent (i.e., there was only one dog in the story), yet participants often opted to include this information, especially when it was distinctive. Thus, the inclusion of modifying information corresponded to a conceptual distinction rather than a purely referential one in that participants were sensitive to semantic and conceptual knowledge about the category to which the referent belonged.

There are several possibilities for why participants tended to provide overspecified expressions especially when the referent had a distinctive property than when it had a normal property. One possibility is that the distinctive properties are just much more salient. For example, work on memory has suggested that features that violate expectations are often noticed and remembered particularly well (e.g., a skull in an office setting, see Brewer & Treyens 1981). In general, people make note of properties that are not similar to those they have seen before and, when communicating, they might prefer to explain these differences to others in the simplest way possible (Garrod & Anderson 1987; Markman et al. 1997). In the current experiment, the distinctive properties might have been more noticeable than normal properties, and this difference might have prompted participants to include them in their response. Another possible explanation is that the distinctive features are more likely to be incorporated into the representation of the target referent because they tend not to be true of the head noun. This explanation is consistent with previous research on novel combined concepts that suggests that features that are true of the entire phrase but not of the head noun in general (e.g., *white* for *peeled apples*) are more available than features that are true of the head noun (e.g., *round*) (Springer & Murphy 1992; Gagné & Murphy 1996) and also with evidence suggesting that people strongly expect property differences between things named with modified and unmodified nouns (Gagné & Spalding 2011; 2014b; Spalding & Gagné 2015). In our experiment, the normal properties were ones that tended to be true of the head noun concept, whereas the distinctive properties were not generally true of the head noun concept. Thus, it is possible that the distinctive property was more likely to be integrated into the representation of the target referent than was the normal property. If so, then distinctive properties would be more likely to be included in the participant's response than would normal properties.

## **3.2 Selection of syntactic form**

There is some tendency to reproduce the syntactic form in which the information was first presented; responses using a modifier-noun phrase are common,

### 6 Production of multiword referential phrases

but are even more used when the story also uses a modifier-noun phrase then when the story uses a full-phrase. Furthermore, although responses using a full response were relatively rare, the vast majority of responses that used a full phrase ( = 144) were produced when the story also used a full phrase whereas only 7 responses using a full phrase were produced when the story did not use a full phrase. This finding is consistent with previous research on syntactic priming (Bock 1986; Bock & Loebell 1990) that found that people are more likely to produce passive constructions when describing a scene when previous sentences contained passive constructions than when previous sentences did not contain passive constructions. The current experiment examined part of a sentence, namely, the structure of a noun phrase, and also found support for syntactic priming.

However, the selection of syntactic form was not completely determined by the form presented in the story. Instead, there was a strong preference toward using a modifier-noun phrase (e.g., *wilted flowers*) rather than a full noun phrase (e.g., *flowers that are wilted*). Previous work on referential communication has indeed shown an overall trend towards the use of shortened expressions (Brennan & Clark 1996; Markman et al. 1997) and analyses of text corpora also show evidence of text compression (Marsh 1984). Thus, the preference for a modifiernoun phrase might reflect a tendency to select a syntactically simpler construction. Modifier-noun phrases are syntactically simpler than full noun phrases and yet still provide information that allows the reader/listener to identify a subcategory of head noun (e.g., *ceramic cup* refers to a particular subcategory of the category *cup*). Thus, modifier-noun phrase constructions offer a balance between syntactic simplicity and informativeness. At the same time, there was little evidence to suggest that participants selected a head noun only structure over a modifier-noun structure, even though head noun only structures are syntactically simpler than modifier-noun phrase structures. That is, rather than exhibiting an overall bias towards shortening, per se, our data indicate a bias towards modifier-noun phrase use, which suggests that modifier-noun phrase might have a special status in the language. Although full phrases (e.g., *flowers that are fresh*) were almost always shortened (to either a modifier-noun phrase or noun, e.g., *fresh flowers* or *flowers*), modifier-noun phrases were rarely shortened to nounonly. Thus, the use of a modifier-noun phrase rather than a full phrase might reflect something about the special status of modifier-noun phrases rather than a general bias toward syntactically simple constructions, per se. That is, it seems likely that modifier-noun phrases are particularly useful for conveying subcategory information. People are sensitive to overt cues that indicate the existence of a contrast set, such as the presence of the word *only*, and the inclusion of this

### Christina L. Gagné, Thomas L. Spalding, J. Claire Burry & Jessica Tellis Adams

cue affects the relative ease of resolving main clause/reduced relative clause ambiguities (Sedivy 2002). Perhaps the inclusion of modifying information in the context implied the existence of a contrast set. This might have encouraged people to use a modifier-noun phrase when referring to the target referent because this construction indicates a contrast set (Markman 1991).

In sum, we see some evidence for syntactic priming in that the form of the presentation in the story could reduce the bias to producing modifier-noun phrases, but the influence of the prior form was relatively weak in that it was not able to overturn the strong preference for modifier-noun phrases constructions. Similarly, although we see some degree of shortening of the referring phrase, there still seems to be a preference for maintaining at least a modifier-noun construction, rather than just a generic noun, even though no modifying information was required in order to identify the referent in the story. This was particularly true when the modifying information was atypical.

## **4 Conclusion**

Our data reveal that the context in which the linguistic expressions are used provides useful cues as to the form that the linguistic expression will take and provide insight into the expectations/biases that languages users use during referential communication. During conversation and referential communication, modifier-noun phrases (e.g., *rotten apple*) are produced for several reasons including distinguishing among potential referents and maintaining conversational pacts. The current experiment demonstrates that modifier-noun phrases also are produced in order to highlight conceptually distinctive properties. The finding that distinctiveness influenced the use of modifying information provides insight into how people use multiword expressions to convey information about how they are conceptualizing the various entities about which they are communicating. In particular, the form of the linguistic construction (e.g., noun versus modifier-noun phrase) provides useful cues as to the intended meaning. Furthermore, although the participants were somewhat sensitive to the syntactic form with which the target was presented, there was a strong bias for the modifiernoun phrase form. In sum, it appears that modifier-noun phrases have a privileged status among multiword expressions and provide a good compromise between competing principles of conveying sufficient information and using simple syntactic structures.

## **References**


## **Appendix**


Table 2: Full list of target items showing the unusual properties, normal properties, and the head noun.

## **Chapter 7**

## *Can you reach for the planets or grasp at the stars?* **– Modified noun, verb, or preposition constituents in idiom processing**

Eva Smolka University of Konstanz, Germany

## Carsten Eulitz

University of Konstanz, Germany

Idioms are a special case of multiword expressions in that their meaning cannot be compositionally constructed from the meaning of the single constituents. The present study examines whether the figurative meaning of an idiom is recognized if critical idiomatic constituents (e.g. noun, verb, preposition) are modified. In three paraphrase experiments, participants saw (a) the canonical idiomatic phrase (e.g., *She reached for the stars*), (b) the idiomatic phrase with a modified constituent (e.g., *She reached for the planets*), or (c) a matched literal control sentence (e.g., *She reached for the sweets*) and rated how strongly the sentence reflected the meaning of a paraphrase of the idiom (e.g., *She has always aspired to unattainable goals*).

Canonical idiomatic phrases and control sentences received highest and lowest paraphrase ratings, respectively, with modified constituents in between. Further, idioms with modified verbs were rated higher in matching the figurative meaning than idioms with modified prepositions or nouns. These findings indicate that the figurative meaning was assembled in spite of the modifications and support the notion that idioms are not fully "semantically fixed". Rather, modified constituents that activate meanings similar to those of the canonical constituents are good candidates in contributing to the figurative meaning of an idiom. We discuss psycholinguistic models on idiom comprehension.

Eva Smolka & Carsten Eulitz. 2020. *Can you reach for the planets or grasp at the stars?* – Modified noun, verb, or preposition constituents in idiom processing. In Sabine Schulte im Walde & Eva Smolka (eds.), *The role of constituents in multiword expressions: An interdisciplinary, cross-lingual perspective*, 179–204. Berlin: Language Science Press. DOI:10.5281/zenodo.3598566

### Eva Smolka & Carsten Eulitz

## **1 Introduction**

Idioms like *nach den Sternen greifen* (literal, L, and figurative, F, translation: 'reach for the stars') represent a special type of multiword expression. As with other semantically opaque word formations, the figurative meaning of idioms is not derived compositionally from the meaning of the constituents and their syntactic assembly. For example, the figurative meaning of the idiom *She spilled the beans* cannot be derived by combining the meaning of the individual constituents (*she, spilled, the, beans*) and their syntactic combination ('an agent spilling some object') as would be the case in *She spilled the coffee*, despite the parallel syntactic structure. Hence, one of the aims of linguistic theory (e.g., Grice 1975; 1978) has been the formulation of distinguishing criteria for idiomatic as compared to literal multiword expressions. The most important of these are semantic fixedness and syntactic anomaly. Semantic fixedness specifies that the figurative meaning does not allow the replacement of any of the constituents (e.g.*\*she dropped the beans; \*she spilled the seeds/pellets*), while syntactic fixedness indicates that the figurative meaning restricts the syntactic transformations that an idiomatic expression may undergo (e.g.*\*the beans were spilled by her; \*she spilled the secret beans*).

Linguistic and psycholinguistic researchers are thus baffled by the question of how idiomatic meaning is processed and stored in lexical memory (Burger 2003; 2004; Cacciari & Glucksberg 1994; Gibbs Jr. 1994; 2002; Swinney & Cutler 1979; for a review see Titone & Connine 1999; Titone & Libben 2014). In particular, it remains an unresolved question whether the meaning of an idiom is represented separately from the meaning of its parts, and how the figurative meaning is assembled. Seminal studies argued for a non-compositional representation in which the whole figurative meaning of an idiomatic phrase is stored as a distinct entry, the idiom word in the mental lexicon similar to the representation of a complex word like *Finanzmarktaufsichtsbehörde* ('financial market supervisory authority'). Idiomatic processing, the process by which figurative meaning is retrieved is thus assumed to be independent from the process by which literal meaning is computed (Bobrow & Bell 1973; Gibbs Jr. 1980; Swinney & Cutler 1979).

In contrast, hybrid approaches assume that idioms are both unitary (i.e. each idiom possesses a distinct lexical entry for its figurative meaning) and compositional (i.e. composed of the single word lemmas of the constituents). The constituents are first processed literally until the idiom key or something akin to a unitary entry that carries the idiomatic concept is reached and activated (Cacciari & Tabossi 1988; Caillies & Butcher 2007; Connine et al. 1992; Cutting & Bock 1997; Gibbs Jr. & Nayak 1989; Holsinger & Kaiser 2013; Sprenger et al. 2006; Titone & Connine 1999). For example, even though idioms are syntactically ana-

### 7 Can you reach for the planets or grasp at the stars?

lyzed similar to literal sentences, Cutting & Bock (1997) postulate a distinct lexical concept node that is activated by the idiomatic concept. Similarly, Sprenger et al. (2006) assume a so-called superlemma like *spill-the-beans* that specifies the information relating to that idiom, such as the single constituents (i.e. *spill*, *the*, and *beans*), their syntactic functions (subject, direct object), syntactic categories (noun phrase, prepositional phrase), and parts of speech (noun, verb). Other hybrid models assume that the literal meanings of the constituents are activated only before the unitary entry is reached. For example, the configuration hypothesis (e.g., Cacciari & Tabossi 1988) postulates a so-called idiom-key – the point at which the specific word configuration renders an idiom with figurative meaning. Words of a sentence are processed in a literal way until the idiom key is reached and the word formation is recognized as expressing figurative meaning. As soon as the idiom key has been hit, only the figurative meaning of the idiom is processed and remains activated, while the literal activations disappear.

However, Smolka and colleagues (Rabanus et al. 2008; Smolka et al. 2007) observed that the literal meaning of verbs remains accessible even after the idiom key has been hit. In two sentence priming experiments, participants read an idiomatic sentence, such as *Sie hat ihm gründlich den Kopf gewaschen* (word by word, W: 'she has him thoroughly the head washed'; literal, L: 'She thoroughly washed his head'; figurative, F: 'She gave him a piece of her mind') and made lexical decisions about words associated with the figurative meaning (e.g. *Standpauke* 'telling-off'), about associations with the literal meaning of the verb (e.g. *Kleidung* 'clothes'), and about matched unrelated words.

Because all sentences were highly predictable (i.e., with cloze probabilities, on average, higher than 87%), the idiom key – the point at which the constituents are recognized to form an idiom – should occur before the sentence-final word (e.g. *gewaschen* 'washed'). The sentences were presented visually and targets were presented 500 ms after the presentation of the verb participle to make sure that the figurative meaning was available. Under these experimental conditions, the configuration hypothesis (Cacciari & Tabossi 1988) predicts figurative meaning activations only. However, the results of both studies showed that associations with the literal meaning of the verb were activated to the same degree as were associations with the figurative meaning.

The authors concluded that (1) the literal meaning of single word constituents is accessed during figurative processing and that (2) the literal meaning, at least that of verbs, remains activated even after the figurative meaning of the idiom has been recognized (e.g., Cacciari & Tabossi 1988; Cutting & Bock 1997; Sprenger et al. 2006). Note that hybrid models, assuming an idiom key, specify that the literal meaning of the constituents is not recalled as soon as the figurative meaning of the idiom is recognized, and (3) described a model on idiom comprehension

### Eva Smolka & Carsten Eulitz

that incorporates the complexity of idiom processing: the meaning of the single constituents is activated, and the joint co-activation of the single constituents activates the figurative meaning at the conceptual level.

The above findings give rise to the following questions: if a single idiomatic constituent activates its literal meaning alongside the figurative meaning, and if the joint activation of idiomatic constituents triggers the figurative meaning, will a close associate of the idiomatic constituent (that activates a similar meaning) contribute to the activation of the figurative meaning of the idiom? For example, will the word *planets* in the configuration *reach for the planets* activate the figurative meaning of *reach for the stars*? A positive finding would indicate that idioms are not as semantically fixed as current models on idiom processing assume (e.g., Sprenger et al. 2006). Furthermore, are some constituents of the idiom more susceptible to modification than others? That is, does the word category of an idiomatic constituent – whether it is a verb, a noun, or a preposition – influence whether the constituent can be modified without losing the figurative meaning?

Indeed, in a recent study, Geeraert et al. (2017) observed that noun constituents of idioms may be modified to some degree. Participants rated the acceptability of idioms in their canonical form (e.g. *... they went through the ceiling*), when idioms were partial forms (e.g. *... they went through it*), when they held an integrated concept (e.g. *... they went through the investment roof* ), or when they were idiom blends (e.g. *... they suddenly went through the charts*). Modifications of the idiom made it less acceptable, however, the degree of the acceptability depended on the type of the variation, indicating that modifications with near synonyms (*roof – ceiling*) or integrated concepts (*investment roof* ) were more acceptable than other variations. The authors concluded that their findings challenge any theories on idiom processing that assume fixed units for the specification of the figurative meaning, be it multiword form units (Bobrow & Bell 1973), superlemmas (e.g. Sprenger et al. 2006), or word configurations (Cacciari & Tabossi 1988).

The aim of the present study was to examine the semantic fixedness of idioms in more detail: (a) Will the figurative meaning of an idiom be retained, if an idiomatic constituent, such as the noun, verb, or preposition, is modified? (b) Will the word category of an idiomatic constituent (noun, verb, preposition) affect whether a modification will preserve the figurative meaning?

For this purpose, we conducted three sentence paraphrase experiments. Each canonical idiomatic sentence, such as *Sie hat immer nach den Sternen gegriffen* (L: 'She always reached for the stars'; F: 'She always reached for the stars') was presented in three versions: (1) with its canonical constituent, (2) with the canonical constituent replaced by a closely associated word, or (3) with the canonical constituent replaced by an unrelated word. We manipulated the noun constituent

### 7 Can you reach for the planets or grasp at the stars?

in Experiment 1, the verb constituent in Experiment 2, and the preposition in Experiment 3. The idiomatic noun constituent (e.g. *stars*) was replaced by a closely associated noun (e.g. *planets*), as in *Sie hat nach den Planeten gegriffen* (L: 'She reached for the planets') or by an unrelated noun (e.g. *sweets*), as in *Sie hat nach den Bonbons gegriffen* (L: 'She reached for the sweets'). In Experiment 2, the idiomatic verb constituent (e.g. *reach*) was substituted by a closely associated verb (e.g. *grasp*), as in *Sie hat nach den Sternen gelangt* (L: 'She grasped at the stars') or by an unrelated verb (e.g. *ask*), as in *Sie hat nach den Sternen gefragt* (L: 'She asked for the stars'). In Experiment 3, the idiomatic preposition was replaced by another preposition, as in *Sie hat zu den Sternen gegriffen* (L: 'She reached to the stars') or by an unrelated prepositional phrase (that held the original preposition of the idiom), as in *Sie hat nach den Bonbons gegriffen* (L: 'She reached for the sweets'). Each sentence was paired with the paraphrase of the idiomatic sentence, *Sie hat immer etwas Unerreichbares angestrebt* (L: 'She always strived for something unreachable'), and participants rated on a scale from 1 to 7 how well the meanings of two sentences mirrored each other. Examples of idiomatic sentences, their modifications and paraphrases are given in Tables 1–3.

In all three experiments, we used idiomatic sentences and minimized the influence of some confounding variables by controlling the following factors: (a) the number of words in a sentence was alike, that is, each sentence was comprised of seven words; (b) all sentences had the same structure (subject-verb-prepositional phrase-participle) and all were presented in the perfect tense, so that the position of the verb was always sentence-final, and (c) all sentences had a high cloze probability (on average 90%), ensuring that the sentence-final word was highly predictable. It was thus established that the phrasal meaning was processed and the word configuration was rendered as figurative before the last word of a sentence. Finally, to provide a strong basis for the generalization of our findings, we examined between 33 and 39 different idiomatic phrases in each experiment.

If "unitary" entries define the idiomatic constituents, then sentences whose idiomatic constituents are replaced by close associates will not be considered to hold the figurative meaning and should yield paraphrase ratings similar to sentences with unrelated constituents. If, however, the assumptions hold (a) that each idiomatic constituent activates its literal meaning, (b) that a close associate of an idiomatic constituent will activate a similar literal meaning and (c) will thus contribute to the joint co-activation of the figurative meaning, sentences holding close associates of an idiomatic constituent will be rated as higher in reflecting the figurative meaning than those with unrelated constituents. Furthermore, if the assumption holds that the verb is the structural center of the phrase (as assumed by Rabanus et al. 2008 and Smolka et al. 2007), the modification of the verb constituent will differ from that of the noun or preposition constituents.


Table 1:

Examples

 of sentence

 triplets

 for idiomatic

 phrases

 with mod-


Table 2:

fied verbs and their

Examples

 of sentence

 triplets

corresponding

 in

 for idiomatic

paraphrases

 phrases

Experiment

 2. *Notes:*

 with modi-


Table 3:

Examples

 of sentence

 triplets

 for idiomatic

 phrases

 with mod7 Can you reach for the planets or grasp at the stars?

## **2 Experiment 1**

## **2.1 Method**

### **2.1.1 Participants**

Thirty-six university students, all native speakers of German, participated in the experiment for course credit or payment.

## **2.1.2 Materials**

Thirty-nine idiomatic phrases were selected for the sentence paraphrase test. We defined an idiomatic phrase as a verb phrase (a) where both the verb and its complement are used in a nonliteral way to produce an overall idiomatic interpretation, (b) that shows some kind of morphosyntactic anomaly, and (c) whose figurative meaning is lexicalized. In the light of these three properties, the idiomaticity of the phrases selected was agreed upon by three independent judges and further verified by reference to an idiomatic phrase dictionary (Worsch & Scholze-Stubenrecht 2002).

Each idiomatic sentence consisted of seven words and was phrased in the perfect tense, rendering the past participle form of a verb in sentence-final position. All idiomatic sentences were chosen from the pool of sentences tested in the sentence-completion experiment described in Smolka et al. (2007). To assure that their figurative meaning was the dominant reading, only idiomatic phrases with high sentence completion rates were selected. That is, these sentences were completed with words that produced the figurative meaning in 93% of the cases (range 52% to 100%).

## 2.1.2.1 Sentence completion task

More than 1,100 sentences in literal and figurative meaning were tested in a sentence completion task (for a more detailed description see Smolka et al. 2007). For the completion task, the last word of a sentence (i.e. the past participle) was omitted and completed by between 25 and 32 monolingual native speakers of German in an online portal (*Language experiments portal* by Keller et al. 1998). For each sentence, the number of sentence completions with a specific verb was counted. For example, 19 of the 25 participants who saw the sentence *Sie hat immer nach den Sternen* (L: 'She always for the stars') completed it with the participle *gegriffen* ('reached') and thus finalized the sentence in its figurative meaning *She always reached for the stars*. The other 6 participants completed the

### Eva Smolka & Carsten Eulitz

sentence with the verb *geschaut* ('looked at') and thus yielded the literal meaning 'She always looked at the stars'.

### 2.1.2.2 Noun association test

Each idiomatic sentence, such as *She reached for the stars*, was phrased in three versions, holding either (a) the canonical idiomatic noun constituent (I), such as *Sterne* ('stars'), (b) an associated noun (A), such as *Planeten* ('planets'), or (c) an unrelated noun (U), such as *Bonbons* ('sweets'). Table 1 provides examples of idiomatic sentences and the noun modifications; Table 4 provides the stimulus characteristics of the idiomatic sentences and their corresponding noun constituents.

> Table 4: Idiomatic sentences and stimulus characteristics of the idiomatic, modified, and unrelated noun constituents in Experiment 1. *Notes:* N = number of items, Lemma = mean lemma frequency per one million, taken from CELEX (Baayen et al. 1993), Association = mean meaning association with idiomatic constituent, Closure = mean sentence completion in %.


To find close associates, two noun associations (e.g. *planets* and *moons*) were selected for each of the idiomatic noun constituents that should be modified (e.g. *stars* in the idiomatic phrase *She reached for the stars*). Care was taken that the associations were unrelated with the figurative phrasal meaning, and that the gender and the number inflections of the noun associations fitted the original idiomatic sentence. To avoid episodic effects, the same noun occurred only once in the whole experiment.

The strength of the associations was assessed in a pre-test that comprised two lists. The two noun associates of the idiomatic constituents were allocated to two lists; in both lists, the idiomatic noun constituent was paired with its associated noun and with an unrelated noun. For example, in List 1, the idiomatic constituent

### 7 Can you reach for the planets or grasp at the stars?

*Sterne* ('stars') was presented with the association *Planeten* ('planets') and the unrelated noun *Praxis* ('practice'); in List 2, *Sterne* ('stars') was presented with *Monde* ('moons') and *Praxis* ('practice').

Forty participants (who did not participate in the paraphrase experiment) rated on a scale from 1 (not at all) to 7 (strongly) how strongly the two nouns (e.g., *stars – planets*) are meaning-related. Noun associations were selected as modifications of the original idiomatic noun, if they received high ratings (mean rating 5.8), and if their lemma and surface frequencies (taken from CELEX, see Baayen et al. 1993) were well matched with those of the idiomatic constituent.

### 2.1.2.3 Paraphrases

For each idiomatic phrase, we constructed a paraphrase by looking up the definition of the idiom in the idiomatic phrase dictionary (Worsch & Scholze-Stubenrecht 2002). For a similar appearance as the idiomatic sentence, the paraphrase was cast in the past perfect tense and with the same subject as that of the idiomatic phrase.

### **2.1.3 Procedure**

Three lists were constructed, each included one of the sentence triplets of an idiomatic phrase. The three versions of an idiomatic phrase were rotated over the three lists by Latin square in such a way that a list contained the idiomatic sentence either with the canonical idiomatic (I), the associated (A), or the unrelated (U) noun. Each of the three sentence triplets was paired with the same paraphrase of the idiomatic sentence (see Table 1 for examples). Altogether, each list comprised 39 sentence pairs.

Each participant saw only one of the three lists, assignment to lists was randomized. Paraphrase tests were distributed via email. Participants were asked to rate (on a scale from 1 to 7) how strongly the two sentences reflected each other's meaning. The instructions included two examples: one sentence pair with high meaning relatedness, the other sentence pair with low meaning relatedness.

## **2.2 Results**

In this and the following experiments, we used R (R Core Team 2013) and lme4 (e.g., Bates 2005; Bates et al. 2015; Baayen et al. 2008) to perform linear mixed effects modeling (LMM). As random effects, we had intercepts for participants and items (i.e. sentences). As fixed effects, we included the factor Modification Type (idiomatic/associated/unrelated), the Sentence Closure of the idiomatic phrase,

### Eva Smolka & Carsten Eulitz

and the Frequency of the constituent. The absolute and normalized lemma frequencies were taken from the CELEX lexical database (Baayen et al. 1993) and were log-transformed and centered (e.g. Winter 2013). All *p*-values were calculated on the basis of Satterthwaite approximation by using the lmerTest package (Kuznetsova et al. 2017). In this and the following experiments, we applied a forward procedure for the model selection, starting with a minimal model and adding additional predictors only when they improved the model fit. The best model fit was obtained by comparing the Akaike Information Criterion (AIC) statistics between models, with a difference between models > 4 (Sakamoto et al. 1986).

Table 5: Fixed effects of the predictors in the linear mixed-effect model for the paraphrase ratings in Experiment 1. *Notes:* significance code: \*\*\* < 0.0001.


The LMM analysis of Experiment 1 indicated that the best model fit included the fixed-effect factor Modification Type, no other fixed-effect factors were significant. Table 5 summarizes the effects; the left panel of Figure 1 depicts the ratings. Results were straightforward: Paraphrase ratings were highest to idiomatic phrases that held the canonical idiomatic noun constituent (mean = 6.25, SD = 1.81), lower to phrases in which the canonical noun was modified by a closely associated noun (mean = 4.13, SD = 2.33), and lowest to phrases with unrelated nouns (mean = 2.52, SD = 2.21).

## **3 Experiment 2**

## **3.1 Method**

### **3.1.1 Participants**

Fifty-eight university students who had not participated in the previous experiment participated in the experiment for course credit or payment. All were native speakers of German.

Figure 1: Paraphrase ratings on a scale from 1–7 for idiomatic sentences holding idiomatic, modified, or unrelated constituents. Noun constituents were manipulated in Experiment 1 (left panel), prepositions in Experiment 3 (mid panel), and verb constituents in Experiment 2 (right panel). Y-bars indicate standard errors of the mean.

### **3.1.2 Materials**

Thirty-three idiomatic phrases were selected for the sentence paraphrase test according to the same principles as in Experiment 1: They were fully idiomatic phrases as defined in Experiment 1 and were selected from the sentence pool described in Experiment 1. To ensure that their figurative meaning was the dominant reading, all had high sentence completion rates, that is, they were completed with verbs that produced the figurative meaning in 91% of the cases (range 52% to 100%).

Each idiomatic sentence, such as *Sie hat nach den Sternen gegriffen* (F: 'She reached for the stars'), was cast in three versions, holding either (a) the canonical idiomatic verb (I), such as *gegriffen* ('reached'), (b) an associated verb (A), such as *gelangt* ('grasped'), or (c) an unrelated verb (U), such as *gefragt* ('asked'). In 26 of the 33 unrelated verbs, also the noun constituent that precedes the verb was modified to create a meaningful sentence, such as *Sie hat nach den Sternzeichen*

### Eva Smolka & Carsten Eulitz

*gefragt* (L: 'She asked for the zodiacs'). See Table 2 for examples of idiomatic sentences and their verb modifications. Table 6 provides the stimulus characteristics of the idiomatic sentences and the corresponding modifications.

> Table 6: Idiomatic sentences and stimulus characteristics of the idiomatic, modified, and unrelated verb constituents in Experiment 2. *Notes:* N = number of items, Lemma = mean lemma frequency per one million, taken from CELEX (Baayen et al. 1993), Association = mean meaning association with idiomatic constituent, Closure = mean sentence completion in %.


### 3.1.2.1 Verb association test

Two verb associations (e.g. *fassen* 'grip' and *langen* 'grasp') were selected for each of the idiomatic verbs that should be modified (e.g. *greifen* 'reach'). It was taken care of that the verb associations were unrelated with the figurative phrasal meaning and that they generated a meaningful sentence. To avoid episodic effects, the same verb occurred only once in the whole experiment.

The strength of the associations was assessed in a pre-test. The two associates of an idiomatic verb were allocated to two lists; in both lists, the idiomatic verb was paired with an associated and an unrelated verb. For example, in List 1, the idiomatic verb *greifen* ('reach') was presented with the association *fassen* ('grip') and the unrelated verb *kleben* ('stick'); in List 2, *greifen* ('reach') was presented with *langen* ('grasp') and *kleben* ('stick'). Each list tested 112 verb pairs.

Thirty participants (who did not participate in the paraphrase experiment) rated on a scale from 1 (not at all) to 7 (strongly) how strongly the meanings of the two verbs (e.g. *greifen – langen*) are related. Verb associations were selected as associations of the original idiomatic verb, if they received high ratings (mean rating 5.8), and if their lemma and surface frequencies (taken from CELEX, see (Baayen et al. 1993) were well matched with those of the idiomatic verb.

### 7 Can you reach for the planets or grasp at the stars?

### 3.1.2.2 Paraphrases and fillers

The same procedure as in Experiment 1 was used to construct the paraphrases for each idiomatic phrase (see also Table 2). In addition to the 33 idiomatic sentences, 22 literal sentences with the same sentence structure were used as fillers and were paired with unrelated paraphrases.

### **3.1.3 Procedure**

Participants were randomly assigned to a list. Paraphrase tests were distributed via email. Participants rated on a scale from 1 to 7 how strongly the two sentences reflected each other's meanings. The instructions included two examples, one sentence pair with high meaning relatedness, the other sentence pair with low meaning relatedness. As in Experiment 1, three lists were constructed in such a way that each included one of the sentence triplets of an idiomatic phrase, either with the idiomatic (I), associated (A), or unrelated (U) verb. Each of the three sentence triplets was paired with the same paraphrase of the idiomatic sentence (see Table 2 for examples). The same 22 filler sentence pairs were added to each list, so that, altogether, each list comprised 55 sentence pairs. The number of fillers ensured that 60% of the sentences in a list were not meaning related with their paraphrase.

## **3.2 Results**

We applied the same LMM analyses as described in Experiment 1. The best model fit included the fixed-effect factor Modification Type and is summarized in Table 7; the right panel of Figure 1 depicts the paraphrase ratings. As in Experiment 1, paraphrase ratings were highest to idiomatic phrases that held the canonical idiomatic verb (mean = 6.54, SD = 1.46), lower to phrases in which the canonical verb was modified by a closely associated verb (mean = 5.96, SD = 1.80), and lowest to phrases with unrelated verbs (mean = 2.18, SD = 2.16).

Table 7: Fixed effects of the predictors in the linear mixed-effect model for the paraphrase ratings in Experiment 2. *Notes:* significance code: \*\*\* < 0.0001, \*\* < 0.01.


Eva Smolka & Carsten Eulitz

## **4 Experiment 3**

## **4.1 Method**

### **4.1.1 Participants**

Fifty university students, all native speakers of German, participated in the experiment for course credit or payment.

## **4.1.2 Materials**

Thirty-three idiomatic phrases were selected for the sentence paraphrase test according to the same principles as in Experiment 1: They were fully idiomatic phrases and selected from the same sentence pool as described in Experiment 1. To ensure that their figurative meaning was the dominant reading, all had high sentence completion rates, that is, they were completed with words that produced the figurative meaning in 90.2% of the cases (range 52% to 100%).

Each idiomatic sentence, such as *Sie hat immer nach den Sternen gegriffen* (F: 'She always reached for the stars'), was cast in three versions, holding either (a) the canonical idiomatic preposition (I), such as *nach* ('after'), (b) a modified preposition (A), such as *zu* ('to'), or (c) an unrelated prepositional phrase that held the same preposition as the idiomatic phrase (U), such as *nach den Bonbons* ('for the sweets'). See Table 3 for examples of idiomatic sentences and their prepositional modifications. Table 8 provides the stimulus characteristics of the idiomatic sentences and the corresponding modifications.

Table 8: Idiomatic sentences and stimulus characteristics of the idiomatic and modified preposition, and unrelated prepositional phrase in Experiment 3. *Notes:* N = number of items, Lemma = mean lemma frequency per one million, taken from CELEX (Baayen et al. 1993), Closure = mean sentence completion in %.


### 7 Can you reach for the planets or grasp at the stars?

### 4.1.2.1 Preposition substitution

Since prepositions may take many different meanings, so that association tests are not applicable, two native speakers selected a preposition (e.g. *zu* 'to') that best matched the meaning of the idiomatic preposition (e.g. *nach* 'after'). We made sure that the modified preposition fitted the sentence frame and generated a meaningful sentence. As unrelated control condition, we used the idiomatic preposition and combined it with an unrelated noun phrase (see Table 3).

### 4.1.2.2 Paraphrases and fillers

The same procedure as in Experiment 1 was used to construct the paraphrases for each idiomatic phrase. In addition to the 33 idiomatic sentences, 22 literal sentences with the same sentence structure were used as fillers and were paired with unrelated paraphrases.

### **4.1.3 Procedure**

Three lists were constructed in such a way that each included the idiomatic phrase with either the canonical idiomatic preposition (I), the modified preposition (A), or the unrelated prepositional phrase (U). Each of the three sentence triplets was paired with the same paraphrase of the idiomatic sentence (see Table 3 for examples). Twenty-two fillers in each list reduced the relatedness proportion (between the sentences of a sentence pair) in a list to 40%. The rest of the procedure was the same as in the previous experiments.

## **4.2 Results**

We applied the same LMM analyses as described in Experiment 1. The best model fit included the fixed-effect factor Modification Type only and is summarized in Table 9 (page 196); the mid panel of Figure 1 depicts the paraphrase ratings. As in the previous experiments, paraphrase ratings were highest to idiomatic phrases that held the canonical idiomatic preposition (mean = 6.26, SD = 1.85), lower to phrases with a modified preposition (mean = 5.29, SD = 2.21), and lowest to phrases with an unrelated prepositional phrase (mean = 2.46, SD = 2.3).

## **5 Post-hoc analysis of Experiments 1–3**

The results of all three experiments showed that idiomatic constituents may be modified by a close associate and still yield the figurative meaning. A visual in-

### Eva Smolka & Carsten Eulitz

Table 9: Fixed effects of the predictors in the linear mixed-effect model for the paraphrase ratings in Experiment 3. *Notes:* significance code: \*\*\* < 0.0001


spection of Figure 1 suggests that modified verbs are better in yielding the figurative meaning than either nouns or prepositions. The following LMM analysis was conducted to test whether the word category of a constituent (noun, verb, preposition) affects how strongly a modification preserves the figurative meaning.

We applied the same LMM analysis as in the previous experiments. As random effects, we had intercepts for participants and items (i.e. sentences). In addition to the previously used fixed effects – Modification Type (idiomatic/associated/ unrelated), the Sentence Closure of the idiomatic phrase, and the Frequency of the constituent (log-transformed and centered, absolute lemma frequencies from CELEX) – we included the factor Experiment (corresponding to the tested constituent). We applied a forward procedure for the model selection, and obtained the best model fit by comparing the Akaike Information Criterion (AIC) statistics between models.

The best model fit included the fixed-effect factors Modification Type and Experiment, and an interaction between the two. Table 10 summarizes the effects. The results reflect the findings depicted in Figure 1. Overall speaking, as in each of the Experiments 1–3, paraphrase ratings were highest to idiomatic phrases that held the canonical constituent, lower to phrases in which the canonical constituent was modified by a closely associated constituent, and lowest to phrases with unrelated constituents. Across experiments notwithstanding, sentences with modified preposition or verb constituents received higher ratings and were thus perceived as better representing the figurative meaning than sentences with modified nouns. Further, sentences holding unrelated verbs received lower paraphrase ratings than sentences holding unrelated nouns, indicating that unrelated verbs are perceived as lowest in representing the figurative meaning.

### 7 Can you reach for the planets or grasp at the stars?

Table 10: Fixed effects of the predictors in the linear mixed-effect model for the paraphrase ratings combining Experiments 1–3. *Notes:* Modified = modified constituent, Unrelated = unrelated constituent, Exp. = Experiment, significance code: \*\*\* < 0.0001, \* < 0.05.


## **6 General discussion**

The present study investigated whether idioms are semantically fixed, as suggested by established linguistic and psycholinguistic models on the processing and production of idioms. We asked first, whether idiomatic constituents may be modified while retaining the figurative meaning, and second, whether some idiomatic constituents are more susceptible to modification than others in keeping the figurative meaning.

Previous studies observed that idiomatic verb constituents activate their literal meaning while they contribute to the activation of the figurative meaning (e.g., Rabanus et al. 2008; Smolka et al. 2007). In the present study, we thus asked whether not only the constituent itself but also a close associate of the constituent (that activates a similar literal meaning) will contribute to the activation of the figurative meaning. We compared the processing of canonical idiomatic phrases like *Sie hat immer nach den Sternen gegriffen* (L: 'She always reached for the stars') with sentences in which one of the idiomatic constituents (i.e., the noun, verb, or preposition) was modified by a close semantic associate, as in *Sie hat immer nach/zu den Sternen/Planeten gegriffen/gelangt* (L: 'She always reached/grasped for/to the stars/planets'). The results of the paraphrase ratings indicated that the figurative meaning of the idiom is recognized even when a semantic associate replaces the canonical idiomatic constituent. That is, modified idiomatic constituents may contribute to the generation of the figurative meaning of the idiom.

### Eva Smolka & Carsten Eulitz

Our findings confirm the findings by Geeraert et al. (2017) that the figurative meaning is accepted when idiomatic noun constituents are modified by near synonyms or semantic associates (e.g. *they went through the ceiling*). We have extended the finding on noun constituents to other idiomatic constituents, such as the verb and the preposition, and have shown that they may be modified as well. Indeed, the modifications of all types of constituents (nouns, verbs, and prepositions) were rated as better reflecting the figurative meaning than unrelated constituents.

We further asked whether a particular type of constituent (noun, verb, or preposition) more strongly preserves the figurative meaning than others. Indeed, our results show that modified verbs are stronger than modified nouns or prepositions at activating the figurative meaning. This finding fits well with the assumption by Hamblin & Gibbs Jr. (1999) that the meaning of the verb in idiomatic phrases may influence the meaning of the idiom. When a verb such as *kick* in *kick the bucket* was replaced by a verb that expressed the fast and sudden action, such as *punt*, this substitution was rated as better preserving the meaning of the idiom than a verb that did not represent the inherent meaning of the verb, such as *nudge*. Hamblin and Gibbs concluded that the verb-inherent action was transferred to the meaning of the whole idiomatic phrase.

In the following paragraphs, we are searching for a plausible reason why the modified verb more strongly activates the figurative meaning than a modified noun or preposition does: Since there are, to our knowledge, no studies that directly compare the processing of different idiomatic constituents (nouns, adjectives, verbs, prepositions) and how each contributes to the overall figurative meaning, we are allowing ourselves to speculate why verb constituents of idioms are differently processed than noun or prepositional constituents.

The processing of modified verbs similar to canonical ones may have been further facilitated by the fact that in the present study verbs occupy the sentencefinal position. From a semantic perspective, the verb is thus partly processed even before it has been encountered. Consider the German idiom *Ich habe ihn sehr ins Herz geschlossen* (L: 'I locked him into the heart'; F: 'I am very fond of him'). The German preposition *in(s)* governs both the dative case for locations (indicating the semantic feature [+static]) and the accusative case for directions (indicating the feature [–static]; Gansel 1992). Because the above example assigns an accusative, the semantic feature [–static] of the participle *geschlossen* ('locked') can be anticipated. Hence, certain semantic properties of the verb are processed before it is realized.

Also from a syntactic perspective, the verb is partially processed even before it has been encountered. According to valency theory (e.g., Tesnière 1959), the

### 7 Can you reach for the planets or grasp at the stars?

verb controls the syntactically obligatory complements.<sup>1</sup> These complements, in turn, are dependent on the subcategorization properties of the verb and are predictable as soon as the verb has been processed. In our sentences, where the verb occupies the sentence-final position, the direction of predictability is reversed: The number and type of complements that occur in the sentence constrain the choice of possible verbs in the last position, so that the verb is partially processed even before it has been encountered.

Moreover, the high cloze probabilities of sentences in the present experiments indicate that participants expect the meaning of a specific verb in sentence-final position. Hence, the meaning of the idiomatic verb constituent was activated before it was encountered, so that the modified verb, which activates a similar literal meaning, is stronger in activating the figurative meaning than other (noun or preposition) constituents that are not as expected.

To summarize, if we assume that (a) the verb-inherent action is transferred to the figurative meaning of the idiom (see Hamblin & Gibbs Jr. 1999), (b) the literal meaning of the verb remains activated even after the figurative meaning of the idiom has been recognized (see Rabanus et al. 2008; Smolka et al. 2007), (c) the syntactic and semantic properties of the verb in the sentence-final position are partly processed before it is encountered, the possibility arises that a close associate of the verb (that activates a literal meaning similar to that of the canonical verb) will trigger the figurative meaning of the idiom.

Overall, the present findings provide evidence against any type of model on idiom comprehension or production that assumes some kind of fixed lexical entry of the idiomatic constituents that generate the figurative meaning, including fixed idiom words (Bobrow & Bell 1973). The present findings also disagree with hybrid models that assume a unitary or fixed representation to capture the idiosyncratic meaning of an idiom, such as the fixed word configuration in form of an idiom key (e.g., Cacciari & Tabossi 1988), fixed superlemmas (e.g., Sprenger et al. 2006), or fixed lexical concept nodes (e.g., Cutting & Bock 1997). For example, according to the configuration hypothesis (Cacciari & Tabossi 1988), the Italian sentence *Dopo l'ottima prestazione, il tennista era al settimo cielo* (F: 'After the excellent performance, the tennis player was in seventh heaven') is processed literally until the specific word configuration *to be in seventh heaven* is recognized to

<sup>1</sup>With respect to literal language, the verb's valency (i.e. the number of complements it requires) was shown to affect both language production (e.g., Thompson et al. 1997) and language comprehension (Shapiro et al. 1987). However, the verb's valency did not affect the processing of figurative language: Idiomatic sentences holding transitive verbs (that require one obligatory complement) and idiomatic sentences holding ditransitive verbs (that require two obligatory complements) were processed equally fast (Dörre & Smolka 2016).

### Eva Smolka & Carsten Eulitz

form the figurative meaning. As soon as the figurative meaning is hit, the literal meaning activation is dropped and no longer active. Accordingly, the presentation of the noun constituent *cielo* ('heaven') did not activate its literal association *stelle* ('stars'). Because the configuration hypothesis assumes that only the very specific word configuration – the idiom key – renders the figurative meaning, a sentence with a modified word configuration such as *The tennis player was in seventh sky* should not be able to activate the figurative meaning.

A similar assumption underlies the concept of the superlemma (Sprenger et al. 2006): A superlemma such as [hit-the-road] specifies the single constituents of the idiom (i.e., *hit, the, road*) as well as their syntactic features and functions. It engages morphosyntactic constraints on the idiomatic configuration to discriminate idiomatic from literal word configurations (Sprenger et al. 2006). Hence, the morphosyntactic constraints of the superlemma [hit-the-road] could not apply to modified constituents such as *hit the street* or *strike the road* and would not retrieve a figurative meaning.

Overall, the present findings provide evidence against any noncompositional lexical representation of the figurative meaning of idioms. By contrast, the present findings fit well with the recent study on idiom variation referred to above (Geeraert et al. 2017). Baayen and colleagues modelled their findings in a naïve discrimination learning (NDL) account (Baayen et al. 2013; 2011; 2016) that entails sublexical orthographic units such as letter trigrams that are mapped onto meaning units in form of so-called lexomes. The lexome of an idiom corresponds to a pointer to its semantic vector like *to die* that is activated by the different letter triplets that the idiom holds. Importantly, the many different inputs may activate the same lexome, so that *to die, pass away,* and *kick the bucket* will all activate the same lexome *die*. This may explain why idioms with modifications may be acceptable to some degree. However, given that the NDL account does not recognize abstract linguistic categories, such as nouns, verbs, or prepositions, it is unclear how it could account for the finding that the modification of verbs is more effective than nouns or prepositions at activating the figurative meaning.

Finally, the present findings fit well with the stem-based account (Günther et al. 2018; Rabanus et al. 2008; Smolka & Libben 2017; Smolka et al. 2007; 2014; 2015; 2019), which is a unitary system for the processing of literal and figurative language: Stems of multiword expressions – ranging from complex verbs and compounds to idioms – activate the literal meanings of the stems, and together the stems co-activate their joint figurative meaning.<sup>2</sup> This holds for the meanings of

<sup>2</sup>Even though the literal meaning of a constituent is assumed to be activated, figurative meanings are not second-level interpretations that necessitate complete literal interpretations of the utterances on the first level. Rather, figurative interpretations do not block the activation of literal associations (see Gibbs Jr. 2002).

semantically transparent and opaque complex verbs (e.g., *understand*) and compounds (e.g., *hogwash*) just as for the opaque meaning of idioms (*kick the bucket*). Because the literal meaning of a constituent is activated alongside the figurative meaning of the multiword expression, semantically associated words that activate a similar meaning as that of the idiomatic constituent will contribute to the figurative meaning assembly.

## **7 Conclusion**

The present findings indicate that lexical representations of idioms are not as semantically fixed as has been assumed so far: Modified constituents that activate meanings similar to those of the canonical constituents will co-activate the figurative meaning of the idiom together with the other idiomatic constituents. Modified verb constituents more strongly activate the figurative meaning than modified noun or prepositions do. Future studies will be necessary to examine how many idiomatic constituents may be modified at once (e.g., *grasp at the planets*) while keeping the figurative meaning of the idiom (e.g., *reach for the stars*).

## **Acknowledgements**

This study was supported by Grant FP561/11 by the Volkswagen Foundation to Eva Smolka. Experiment 1 was part of Sarah Baumann's M.A. thesis, we thank her for conducting the experiments.

## **References**


### Eva Smolka & Carsten Eulitz


7 Can you reach for the planets or grasp at the stars?


### Eva Smolka & Carsten Eulitz


Abreu, Antonio, 6 Ackema, Peter, 69 Aedmaa, Eleri, vi Aicher, Karen A., 109 Aldinger, Nadine, v Alexiadou, Artemis, 70, 72 Anderson, Anthony, 156, 162, 170 Anscombre, Jean-Claude, 142 Ariel, Mira, 158 Arts, Anja, 159 Baayen, R. Harald, 42, 112, 115, 119, 139–141, 151, 188–190, 192, 194, 200 Baldwin, Timothy, v, 75, 101 Bally, Charles, 34 Balota, David A., 120 Bannard, Colin, v, x Baroni, Marco, vii, 34, 36, 38–40, 66, 76 Bartning, Inge, 135, 142 Bates, Douglas, 189 Bauer, Laurie, 34, 39 Baumann, Sarah, 201 Bell, Melanie J., vii, 133 Bell, Susan M., x, 180, 182, 199 Benveniste, Émile, 131 Bernardi, Raffaela, 38 Beyersmann, Elisabeth, 109 Bisetto, Antonietta, viii, 33–36, 40 Bloomfield, L., 34 Bobrow, Samual A., x, 180, 182, 199

Bock, Kathryn, x, 162, 165, 171, 180, 181, 199 Booij, Geert, 34, 136–138 Borer, Hagit, 65, 72, 73, 80, 83, 95 Boroditsky, Lera, 9 Bosredon, Bernard, 135, 142 Bott, Stefan, v, 76, 77, 100 Bowers, Jeffrey, 109 Brekle, Herbert E., 158 Brennan, Susan E., 158, 159, 162, 171 Brewer, William F., 170 Briscoe, E.J., 161 Brown, Roger, 158 Brysbaert, Marc, 43 Buenafuentes de la Mata, Cristina, 130, 132, 134 Burger, Harald, 180 Butcher, Kirsten, x, 180 Cacciari, Cristina, ix, x, 180–182, 199 Cadiot, Pierre, 135, 142, 144 Caillies, Stéphanie, x, 180 Carstairs-McCarthy, Andrew, 160 Chafe, Wallace, 158 Chang, Franklin, 163 Chomsky, Noam, 65, 69 Church, Kenneth W., 44 Clark, Eve V., 160 Clark, Herbert H., 156, 158, 159, 162, 171 Cleland, Alexandra A., 163 Coecke, Bob, vii

Colston, Herberg L., 5, 6 Connell, Louise, 52 Connine, Cynthia M., ix, x, 180 Connolly, Andrew C., 161 Cook, Paul, v, vii Copestake, Ann, 161 Cordeiro, Silvio, 37, 38 Costello, Fintan J., 34 Cutler, Anne, x, 180 Cutting, J. Cooper, x, 180, 181, 199 Dagan, Ido, 37 de Bustos Gisbert, Eugenio, 130, 132 de Marneffe, Marie-Catherine, 78 Derwing, Bruce L., 113 Di Sciullo, Anna Maria, 73, 83, 130 Diependaele, Kevin, v Dietterich, Thomas G., 87 Dima, Corina, 37, 54 Dinu, Georgiana, 39, 40 Dirven, René, 5 Dobel, Christian, 163 Downing, Pamela, 158, 164 Dowty, David, 65 Dressler, Wolfgang U, 34 Dreyfuss, Henry, 7 Dumais, Susan T., 37 Dörre, Laura, 199 Eckart, Kerstin, 8 Faaß, Gertrud, 8 Fabb, Nigel, 34 Fanselow, Gisbert, 34 Farahmand, Meghdad, 38 Fares, Murhaf, 37 Fazly, Afsaneh, x Feldman, Laurie Beth, v, 114 Feldweg, Helmut, 77

Fellbaum, Christiane, 77 Ferreira, Victor S, 158 Fiorentino, Robert, vii Firth, John R., v, 37 Fokkens, Antske Sibelle, 75 Forster, Kenneth, iv, vii, 108, 112, 113 Fradin, Bernhard, 130 Frank, Michael C, 158 Frassinelli, Diego, vi, 2, 24 Frisson, Steven, vii Frutiger, Adrian, 6, 7 Fund-Reznicek, Ella, vii Gagné, Christina L., vii, 35, 37, 38, 53, 141, 157, 160–162, 164, 165, 170 Gansel, Christina, 198 Garrod, Simon, 156, 162, 170 Geeraert, Kristina, 182, 198, 200 Gentner, Dedre, 52 Gibbs Jr., Raymond W., ix, x, 5, 6, 180, 198–200 Gillick, Dan, 78 Girju, Roxana, 141 Glucksberg, Sam, ix, 180 Goldberg, Adele E., 136, 137 Gonnerman, Laura M., v Goodman, Noah D, 158 Grice, H. Paul, 158, 180 Grimm, Scott, 72, 93 Grimshaw, Jane, viii, 61–63, 65, 66, 70–73, 78, 80–83, 93, 95, 98, 101 Grover, Claire, 75, 101 Guevara, Emiliano, 36, 38, 41, 54, 130, 131 Gundel, Jeanette K., 158 Günther, Fritz, v, 36, 38, 43, 200 Gärdenfors, Peter, 5

Hall, Mark, 86 Hamblin, Jennifer L., ix, 198, 199 Hamp, Birgit, 77 Hampton, James A., 161 Hanks, Patrick, 44 Harris, Zellig, v, 37 Haselbach, Boris, vi Haspelmath, Martin, 34 Hätty, Anna, 76, 77, 100 Hennecke, Inga, 139–141, 151 Henrich, Verena, 77 Hermann, Karl Moritz, vii Herskovits, Anette, 5 Hinrichs, Erhard, 37, 77 Holsinger, Edward, x, 180 Hovy, Eduard, 37, 54 Huang, Zhongqiang, 78, 86 Ibarra, Alyssa, 159 Iordăchioaia, Gianina, 66, 74, 80 Isel, Frédéric, vii Jackendoff, Ray, 5 Jarema, Gonia, 157 Järvikivi, Juhani, 109 Ji, Hongbo, 35 Juhasz, Barbara J., 76 Jönsson, Martin L., 161 Kaiser, Elsi, x, 180 Kamp, Hans, vi Keane, Mark T., 34 Keller, Frank, 187 Khvtisavrishvili, Nana, 76 Kliche, Fritz, vi, 3, 9, 10, 26 Koolen, Ruud, 158, 159 Köper, Maximilian, v–vii, x, 2 Kornfeld, Laura Malena, 130 Kövecses, Zolzan, 6

Kuperman, Victor, 37 Kuznetsova, Alexandra, 190 Lachmair, Martin, 25 Lakoff, George, 5 Landauer, Thomas K., 37 Lang, Jürgen, 142, 144, 145 Lapata, Maria, 75, 100, 101, 161 Lapata, Mirella, vii, 36–38, 76 Larson, Richard K., 65 Laumann, Ferdinand, 136, 145–147 Lavric, Aureliu, 109 Lazaridou, Angeliki, 42 Lechler, Andrea, vi, 3, 9 Lees, Robert B, 34 Lehtonen, M., 109 Levi, Judith N., 34, 36, 75, 162, 164 Li, Linlin, x Libben, Gary, vii, 76, 107–110, 113, 114, 117, 122, 157, 161, 200 Libben, Maya, x, 180 Lieber, Rochelle, 34, 35, 43, 51, 53, 62, 69, 72, 74 Lindner, Susan, 3, 6 Loebell, Helga, 165, 171 Longtin, Catherine-Marie, iv, v, 109 López, Maria Luisa, 148 Louwerse, Max M, 52 Love, Bradley C., 38 Luzzatti, Claudio, vii, 35 Lynott, Dermot, 38, 43, 52 Macleod, Catherine, 78 Maes, Alfons, 158 Manning, Christopher D., 78 Marchand, Hans, 34

Kratzer, Angelika, 2, 65 Kühner, Natalie, v

Marelli, Marco, vii, 35, 36, 38, 41–43, 66, 76 Markman, Arthur B., 170, 171 Markman, Ellen M., 172 Marsh, Elaine, 171 Marslen-Wilson, William D., iv, v Masini, Francesca, 130–133, 137–139, 141, 142, 151 McCarthy, Diana, v McNally, Louise, 72, 93 Melinger, Alissa, 163 Metzing, Charles, 162 Meunier, Fanny, v Mikolov, Tomas, 39 Mitchell, Jeff, vii, 36, 38, 76 Monahan, Philip J., vii Montague, Richard, 63 Morgan, Pamela S., 6 Morris, Joanna, 109 Murphy, Gregory L., 160, 170 Napoles, Courtney, 78 Nayak, Nandini P., 180 Neeleman, Ad, 69 Nguyen-Hoan, Minh, v Nicholson, Jeremy, 75, 101 Ó Séaghdha, Diarmuid, 37, 54, 76 Olson, David R, 158 Pacagnini, Ana Maria Judith, 148– 150 Padó, Sebastian, 37 Pantel, Patrick, v, 37 Partee, Barbara H., 63 Pechmann, Thomas, 158, 159 Pickering, Martin J., 163 Plaut, David C., v Prince, Ellen F, 158

Pyykkönen, Pirita, 109 Rabanus, Stefan, 181, 183, 197, 199, 200 Rainer, Franz, 131 Ramscar, Michael, 38, 43, 112 Rastle, Kathleen, v, 109 Reddy, Siva, vii, 37, 38, 76 Reyle, Uwe, vi Ribeiro, Sílvia, 130, 133, 135 Richardson, Daniel C., 6 Rio-Torto, Graça, 130, 133, 135 Roeper, Thomas, 69 Roller, Stephen, 38 Roßdeutscher, Antje, vi, 3, 9 Rubio-Fernández, Paula, 159 Rueckl, Jay G., 109 Sag, Ivan A., 2 Sahel, Said, 114 Sakamoto, Yosiyuki, 190 Salehi, Bahar, vii, 37, 38 Sandra, Dominiek, vii Santorini, Beatrice, 78 Scalise, Sergio, viii, 33–36, 40, 51 Schaefer, Edward F., 156 Schäfer, Martin, vii, 133 Schmidtke, Daniel, 162 Scholze-Stubenrecht, Werner, 187, 189 Schulte im Walde, Sabine, v–vii, x, 2, 4, 37, 38, 76, 77, 100 Sedivy, Julie C, 158, 172 Selkirk, Elisabeth O., 69 Shapiro, Lewis P., 199 Shoben, Edward J., 141, 162 Shwartz, Vered, 37 Side, Richard, 6 Siegel, Muffy, 69

Smolka, Eva, v, vii, 112, 157, 181, 183, 187, 197, 199, 200 Spalding, Thomas L., vii, 35, 37, 53, 141, 157, 161, 162, 164, 165, 170 Sporleder, Caroline, x Sprenger, Simone A., x, 180–182, 199, 200 Springer, Ken, 170 Springorum, Sylvia, vi, 2–4, 9 Štekauer, Pavol, 34 Stevenson, Suzanne, v Stiebels, Barbara, 3 Swinney, David A., x, 180 Tabossi, Patrizia, ix, x, 180–182, 199 Taft, Marcus, iv, v, vii, 108, 112, 113 Talmy, Leonard, 11 Tamba, Irène, 135, 142 Tanenhaus, Michael K., 158, 159 Tarenskeen, Sammie, 163 Tesnière, Lucien, 198 Thompson, Cynthia K., 199 Thornton, Anna, 130 Titone, Debra A., ix, x, 180 Tratz, Stephen, 37, 54 Treyens, James C., 170 Turney, Peter D., v, x, 37 Tversky, Barbara, 8, 9, 11 Utt, Jason, 2, 4 Val Àlvaro, José Francisco, 132 Van de Cruys, Tim, 37 van Heuven, Walter J. B., 78 Vecchi, Eva Maria, 42 Viberg, Ake, 5

Vieira, Sarah Barbieri, 6 Villoing, Florence, 130, 131 Waldron, Ronald A., 5 Wang, Hsueh-Cheng, 37 Warren, Beatrice, 34 Westerbeek, Hans, 159 Wilkes-Gibbs, Deanna, 156, 162 Will, Udo, 114 Williams, Edwin, 130 Winter, Bodo, 190 Wisniewski, Edward J., 33, 34, 38, 43, 51, 52 Worsch, Wolfgang, 187, 189 Zamparelli, Roberto, 36, 38, 66, 76 Zanzotto, Fabio Massimo, 36, 38 Zwarts, Joost, 26

Zwitserlood, Pienie, vii, 76, 133

# Did you like this book?

This book was brought to you for free

Please help us in providing free access to linguistic research worldwide. Visit http://www.langsci-press.org/donate to provide financial support or register as a community proofreader or typesetter at http://www.langsci-press.org/register.

## The role of constituents in multiword expressions

Multiword expressions (MWEs), including noun compounds (such as *nickname* in English and *Ohrwurm* in German), complex verbs (such as *give up* in English and *aufgeben* in German) and idioms (such as *break the ice* in English and *das Eis brechen* in German), may be interpreted literally but often undergo meaning shifts with respect to their constituents. Theoretical, psycholinguistic as well as computational linguistic research remain puzzled by when and how MWEs receive literal vs. meaning-shifted interpretations, what the contributions of the MWE constituents are to the degree of semantic transparency (i.e., meaning compositionality) of the MWE, and how literal vs. meaningshifted MWEs are processed and computed.

This edited volume presents an interdisciplinary selection of seven papers on recent findings across linguistic, psycholinguistic, corpus-based and computational research fields and perspectives, discussing the interaction of constituent properties and MWE meanings, and how the constituents contribute to the processing and representation of MWEs. The collection is based on a workshop at the 2017 annual conference of the German Linguistic Society (DGfS) that took place at Saarland University in Saarbrücken, Germany.