Chapter 21 From Frequency Counts to Contextualized Word Embeddings
The Saussurean turn in automatic content analysis
Text, the written representation of human thought and communication in natural language, has been a major source of data for social science research since its early beginnings. While quantitative approaches seek to make certain contents measurable, for example through word counts or reliable categorization (coding) of longer text sequences, qualitative social researchers put more emphasis on systematic ways to generate a deep understanding of social phenomena from text. For the latter, several qualitative research methods such as qualitative content analysis (Mayring, 2010), grounded theory methodology (Glaser & Strauss, 2005), and (critical) discourse analysis (Foucault, 1982) have been developed. Although their methodological foundations differ widely, both currents of empirical research need to rely to some extent on the interpretation of text data against the background of its context. At the latest with the global expansion of the internet in the digital era and the emergence of social networks, the huge mass of text data poses a significant problem to empirical research relying on human interpretation. For their studies, social scientists have access to newspaper texts representing public media discourse, web documents from companies, parties, or NGO websites, political documents from legislative processes such as parliamentary protocols, bills and corresponding press releases, and for some years now micro-posts and user comments from social media. Computational support is inevitable even to process samples of such document volumes that could easily comprise millions of documents.
Keywordssurvey data, data analysis, data science, information technology, AI, socio-robotics, quantitative, survey methodology, ethics, ethical standards, privacy, replication, politics, survey design, social media, big data, social, human-robot interaction, machine learning, open data, data archives, data ownership, digital trace, unstructured data
ISBN9780367457808, 9781032077703, 9781003025245
PublisherTaylor & Francis
Publication date and place2022