Simon Lindgren & Ragnar Lundström, Umeå University
This workshop will give an introduction to a method for analyzing language and discourse that retains the epistemology of cultural analysis while dealing with large datasets. This is done through the application of “the constant comparative technique” in an analysis employing tools from bibliometrics and social network analysis while reworking notions from discourse theory. This method is introduced in this hands-on workshop as Connected Concept Analysis (CCA). The participants will use the freely available online Textometrica toolkit to move from unprocessed full text content to maps of connected concepts, like the following one (based on a dataset from a forum for domestic violence victims).
This technique could be applied in the same way to any large corpus of full text data. CCA moves through the six steps of atomization (splitting up and deconstructing the text), filtering (removing “noise”), conceptualization (qualitative coding of concepts), connection (analysis of co-occurences between themes in text blocks), visualization (producing network graphs representing the discourse), and finally validation. The workshop will be lead by Professor Simon Lindgren and Dr Ragnar Lundström. Both researchers use CCA in their research and have contributed to its development. Textometrica, developed by Lindgren together with programmer Fredrik Palm, has been created specifically for doing CCA. Lindgren has presented a previous version of the method, using other and less tailored software in a series of online workshops.
In the face of the ready availability online of complex, text-based, large-scale datasets there is an inescapable need of finally coming to terms with the qualitative-quantitative divide in text analysis. No longer can one escape into a locked position where the study of meaning-making has to rely on in-depth studies of a few cases, as this would make it very obvious that the majority of the dataset was neglected. And no longer can one hide behind claims that data in large numbers must be processed solely according to the rules and conventions of statistical inquiry, since this would make it very apparent that the quality of the data (language, communication, cultural content) could not thus be understood in all of its complexity. Qualitative and quantitative social researchers could previously exist in boxes separated by walls of incommensurability erected through their disparate choices for generating datasets of vastly differing characters — obscuring the inevitable coexistence of qualitative and quantitative aspects in reality. But now, facing the challenge of large online texts that lay bare the fact that meaning-making happens in large numbers, and the fact that these large numbers in turn cannot be understood without in-depth interpretation, they must find entirely new approaches.
Today, there seems to exist an increasingly widespread consensus that a recommendable solution to the dilemma is to employ combinations of qualitative and quantitative methods, at the same time benefiting from their various strengths and balancing their respective weaknesses. However, most such “mixed methods” approaches rely on rigid definitions of the two respective paradigms to be combined, and suggest frameworks based on different forms of complementarity or “triangulation”. The qualitative tradition is seen as the more inductively, or “abductively”, oriented interpretive study of small numbers of observations, while the quantitative tradition is characterized by the deductively oriented statistical study of large numbers of cases. This has given rise to the common notion that qualitative research produces detailed accounts through close readings of social processes, while quantitative research renders more limited, but controllable and generalizable, information about causal relations and regularities of the social and cultural fabric.
On the internet, vast amounts of text data are registered and aggregated, including information about network links, independently of initiatives from researchers. Digital culture is one big data collection machinery, and while previous studies of a specific form of interaction would require the design and implementation of an original study from scratch, the method today would instead be to find a slice of that type of interaction readily documented online and download and prepare relevant parts for analysis. This means that the choices of the researcher, as regards designing the data are increasingly backgrounded in studies of digital culture. Scholars are now increasingly facing the challenge of thinking up and constructing “methods” after the fact. In short, there is a pressing need to find efficient ways of qualitatively interpreting large masses of text. Texts are irrevocably embedded in arbitrary systems of language and culture from which their understanding must not be disconnected. While texts may be quantitatively deconstructed through approaches in content analysis, physics or computational linguistics, many of these methods will dissolve the data in ways that leave variable oriented strategies as the only way to proceed with the analysis.
Prerequisites and outcomes
The workshop will be based on example data provided by the workshop leaders. Participants will be walked through the analytical steps in Textometrica, and introduced to the required concepts along the way. This means that no specific skills are needed to take part. After having completed the workshop, the participant will have a basic understanding of the rationale for, and uses of, CCA, and enough hands-on skills to start working with their own analyses.