MODELS AND ANALYSIS OF VOCAL EMISSIONS FOR BIOMEDICAL APPLICATIONS

The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the newborn to the adult and elderly. Over the years the initial issues have grown and spread also in other fields of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years in Firenze, Italy. This edition celebrates twenty-two years of uninterrupted and successful research in the field of voice analysis.


FOREWORD
This book of Proceedings includes the contributions presented at the 11 th International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications -MAVEBA 2019, held in Firenze from 17 to 19 December, 2019.That is, 20 years since the very first MAVEBA in 1999!Looking back to those days, I remember well the spirit of adventure that inspired this initiative, both on my side and on that of my colleague Piero Bruscaglioni, with whom I also shared many subsequent MAVEBA editions.
MAVEBA started because of our curiosity and continued thanks to the enthusiasm of the participants.And today?Curiosity and enthusiasm are still there, with the awareness of a fascinating and increasingly interdisciplinary world.The large number of contributions collected in this Proceedings is the clear demonstration of this.
The main subjects concern methods for analyzing hoarseness and retrieving features of the human voice related to particular physiological or neurological conditions, with the aim of assessing reliable procedures for objective, quantitative definition of levels of voice disorders, singing voice parameters, newborn cry features, vocal fold and vocal tract modelling.The interdisciplinarity, that has always characterized the MAVEBA workshops, is well highlighted by the themes addressed, listed below.
I whish to give special thanks and greetings to the CoMeT Association, that is present at MAVEBA with a large number of its members.This year is a special one for CoMeT, celebrating the 50th anniversary from its foundation, and I am happy and proud to celebrate it together with the twenty-year anniversary of MAVEBA!
The papers presented at MAVEBA and collected in this volume are divided into nine Sessions, two Special Sessions, professionally coordinated by Dr. Franco Fussi and Dr. Philippe Dejonckere, and a Keynote lecture given by Thanasis Tsanas.I am very grateful to the authors for their contribution and to all participants that stimulated the discussion and helped to propose new research themes and methodologies of analysis in a field that will always be evolving, even and hopefully in the next twenty years.

Claudia Manfredi
Abstract: The Tomatis electronic ear is a device that could modify the natural audio feedback between the emitted voice and the ears of a talking or singing individual.Our aim was to test if the device causes quantifiable vocal variations having the subjects repeat sustained vowel sounds (i.e./a/, /i/, /u/) with different frequency filters applied by the device.The subjects are 19 native adult Italian speakers (8 females) testing 4 different filtering methodologies: unfiltered feedback (control), low pass filter at 4 kHz, high pass filter at 4 kHz and a high pass filter at 8 kHz.All subjects quantifiably modified their vocalization in response to the varying methodologies for at least one letter of each filtering method: 81.29% of the sessions of all subjects were significantly different in fundamental frequency from the control (p<0.05,Kruskal-Wallis test).Among subjects, the variation trend was significant only for the fundamental frequency of the letter /u/ of a particular subgroups categorized by mean fundamental frequency.This initial work shows that the vocal variations caused by the Tomatis device are quantifiable but subject specific, laying the groundwork to test new parameters to find common trends of configurations.

Keywords:
Tomatis, Electronic Ear, audio feedback, audio stimulation

I. INTRODUCTION
Alfred Tomatis was a French scientist, founder of Audio-Psycho-Phonology, an auditory rehabilitation methodology that stimulates the ear modifying the auditory input.This stimulation is delivered through a device called Electronic Ear.This device is based on a series of amplifiers, filters, and electronic controls, which receives the sound, emitted by a source, processes it and sends it back to the subject through a special headset.Tomatis's theory of listening is the product of a series of rigorous neurophysiological studies, based on the phylogenetic and ontogenetic analysis of the development of the nervous system [1][2] [3].It was fundamental to highlight the common origin and the consequent structuring of the organs responsible for vocal emission (for example, V cranial pair for the musculature of the mandible and for the muscle of the hammer, VII cranial pair for the upper part of the larynx, for facial muscles and for the muscle of the stirrup), thus evidencing the very close correspondence between listening and voice production.The conclusions reached by Tomatis are as follows: "the voice can only contain the frequencies that the ear can hear (the larynx emits only the harmonics that the ear can hear)" and "if one modifies the hearing, the voice is unconsciously and immediately modified" [4][5] [6].In 1957, the theory was experimentally corroborated by a team led by Raoul Husson in the Functional Physiology laboratory at the Sorbonne in Paris [7].After this experiment, fewer than a dozen offshoot and the related training systems have been developed based on this effect, with mild claims of effectiveness [8].Only a fraction of these studies used the voice of the subject as auditory input.Our aim, in this preliminary work, is to test a new model of the Electronic Ear and the vocal variations that it causes on subjects emitting simple sounds (i.e.single sustained vowels) that are modified and fed back to them through special earpieces.This experiment was chosen to test the effectiveness of the device at a fundamental level, as a first step to map the actual capabilities of the device and of the method.

II. METHODS
In total, 19 native Italian speakers, 8 females and 11 males without speech impairments were recruited.The experimental setup included a microphone (Shure BETA 58A, Beyerdynamic TG V56c), the Tomatis system (Brain-Activator MBL), and an external recording device (M-AUDIO Fast Track Pro, sampling at 44100 samples/s, and a recording computer).The subjects were standing in a pre-marked position with their back and head touching a wall.The microphone was placed in a fixed position.This preliminary experiment consisted of 4 segments, each composed of three sessions.In the first task of each segment, the subjects had to repeat 20 times the

THE TOMATIS ELECTRONIC EAR EFFECTS ON SIMPLE VOCALIZATIONS
M. Prenassi 1 , W. Coppola 2 , G. Ramponi 3 , T. Agostini 2 , S. Marceglia 1, 3 Fig.2: Difference for the F0 of letter /u/ between the HP4K method and control (NF), regarding the low F0 group.Fig. 3: First and second formant for the all-subject average vowel triangle of the letter /a/, /i/, /u/ for the LP4K and HP4K methods.At the top the high F0 group and at the bottom the low F0 group methods with the significant F2 drifts for the letter /u/ (top) and /a/ (bottom).low-F0 category.This classification was performed because group-specific filtering effects were observed in the preliminary analysis.
In the low-F0 group there was a significant increase by 4.3 Hz (p<0.04,Wilcoxon signed rank test, Fig. 2) in the fundamental frequency (F0) of the letter /u/ of the HP4K session.For the same vowel, between HP4K and LP4K, F2 increased by 68.23 Hz (p<0.006,Wilcoxon signed rank test) in the high-F0 group, and increased for the letter /a/ by 33.63 Hz (p<0.02,Wilcoxon signed rank test) for the low-F0 group, as shown in Fig. 3.The standard deviation analysis had a statistical significance only in the F2 formant for vowels /a/ and /i/ in the high-F0 group: it decreased from HP8K to HP4K and increased from HP4K or HP8K to LP4K.In the low-F0 group, the standard deviation had statistical significance only for the letter /i/: it increased in F1 for HP8K compared to controls, and for LP4K compared to controls.It decreased in F2 for HP8K compared to HP4K.Fig. 4 shows the average values of F0 for all the subjects in the high-F0 and low-F0 groups; all the letters showed a distinct increase between control NF and the methods, except for /a/ in the lower F0 group.However, these results are statistically different only for the letter /u/ in the lower F0 group between NF and HP4K (p<0.05,Wilcoxon signed rank test).

IV. DISCUSSION
As shown in the results section a clearly audible and statistically relevant response is evoked by the filtered Tomatis audio loop, even if a unified trend response among subjects is not clearly delineated.The first results we present in this paper indicate some ability of the Tomatis system to modify these covariances and as a consequence to act on general voice quality.Indeed, experiments in which the relationship among F0 and the F1 and F2 formants are synthetically modified are reported in the open literature (e.g., [11]).It was shown that the perceived quality of a voice depends on the covariance of the formants, which should correspond to an internalized representation of human voice.Even the perception of emotions of the speaker depends on the formants' properties [12].Even if not statistically relevant, it is also worth of notice the precise pattern followed by the all-subject average of F0 shown in Fig. 4. The HP4K method is able to elicit bigger changes in the fundamental frequency compared to the controls.

V. CONCLUSION
In this preliminary experiment, we showed that the vocal changes elicited in the subjects by the Electronic Ear are quantifiable.The specific effects depend on the type of vocalization and on the class of the subject (high-F0 or low-F0).A clear trend in the fundamental frequency was detected only for the /i/ and /u/ vowels.The standard deviation analysis suggests that a central (4 kHz cut-off frequency) high pass voice filter tends to increase the sound variation.This initial work may be useful to understand the capabilities of the Electronic Ear and the Tomatis methodology.