How to code qualitative data? (PART1)

Anyone, who ever worked with qualitative data, knows how messy it can be without a rigorous methodological strategy. No matter whether it is about focus group transcripts, observational notes, open-ended survey questions, or other text documents, as the quantity increases, that efficient analysis becomes a challenge. It is pointless to make any comparison with quantitative data analytics since the goal is usually different here: we want to increase our understanding or formulate hypotheses. So let’s imagine, dear reader that we are already after the data collection, and we want to analyze a series of semi-structured in-depth interviews. We have already hundreds of pages long textual transcripts.

Coding is the first analytical step, which connects data collection with interpretation. Our codes express how we sorted, segmented, and grouped our data. One way to do this is by adopting grounded theory methods (Glaser & Strauss, 2017). These methods are flexible but systematic guidelines that aim to construct theories based on the data. Coding here is a bridge between data collection and developing an emergent theory. Kathy Charmaz, the developer of Constructivist Grounded Theory, lists three types of grounded theory coding in her book: Constructing grounded theory: A practical guide through qualitative analysis (Charmaz, 2006).

  1. The initial (line-by-line) coding is the first step to conceptualizing any ideas. Here you segment the data into small sections and label them in a condensed way. These labels will help you to develop abstract ideas. At this point, you remain open to all possible theoretical directions. You ask questions like ‘What is this data set about?’, ‘What does it suggest?’, ‘From whose point of view?’, ‘What theoretical category might be relevant?’ etc. If you work in teams, you can share the sections and combine your different coding. Keeping your mind open can be challenging. You can feel, that you are lost in the vast amount of statements, thoughts, and feelings. However, moving away from the well-known theoretical conceptions will allow you to discover new fields, holes, or gaps. This type of approach can be especially fruitful in the case of a mixed-methods design, where the forthcoming quantitative strand can be largely influenced by the findings of the first, explorative qualitative strand. Charmaz suggests being fast and spontaneous in this stage to maintain a fresh view. Do not hesitate to change your first code, if you are not satisfied with its language, neutrality, or focus. The shorter and simpler your codes are, the easier it is to navigate across them. You can apply word-by-word coding when you are more nuanced, or even apply the participant’s special terms (in vivo codes) or line-by-line coding when you brake the text into components. The advantage of this method is, that as a first step, you don’t have to consider the whole picture, but gain insight ‘line-by-line’. You can pay more attention to the actual processes, actions, how processes develop, how they change, and what their consequences are.  Incident-by-incident coding on the other hand means, that you compare incidents or observations searching for dissimilarities.
  2. The second step, focused coding, permits you to separate, sort, and synthesize large amounts of data. This step uses the most significant or frequent initial codes and lets you start a theoretical integration. It is always up to the researcher’s intuition, how he or she decides about the order of significance. This step can be hardly detached from the analytical phase since the researcher forms already ideas and conceptions during arranging the sections and codes.
  3. Axial coding is an alternative to focused coding offered by Strauss and Corbin (1990), and it aims to relate categories to subcategories. It identifies an axis and a network of other categories/subcategories providing a coherent picture. This type of coding is rather conceptual and less descriptive.


Use software!

Even excel can work if your data set is not that large, but I would recommend using software that is designed for this purpose to be fast and efficient. A good alternative is NVivo, which is unfortunately not free but offers a lot of possibilities. I used it for the first time more than 5 years ago, and the software has developed a lot since then. You can import word and pdf data (audio, video, and picture files too, but normally I work with transcripts), you can create a coding table, create sub-codes or combine codes, you can visualize your data, create a mind map, write memos or work in a team in a collaborative environment, just to mention some of its features.  

Stick close to the data!

As you code, you can already consider which theoretical categories these statements might indicate. Some codes will be central to analyzing a story, some will give the reasons, and other the context. Some will be less firmly apparent than others, but you shouldn’t forget to stick close to the data.

Don’t force your preconceptions!

If you apply grounded theory methods, you create the codes before you have preconceived categories. Let coding lead you to unforeseen domains! It can be a challenge to put aside your preconceptions, but being self-reflexive all along the process can help you. Some preconceptions related to the researcher’s class, race, gender, age, values, etc. may permeate the analysis without awareness. Data have to support all the conceptions the researcher applies, these shall not source from the world view of the researcher. In the case of some topics, it is easy to manage this aspect, in other cases special attention is needed. 

Be conscious of your level of involvement!

Sometimes it happens, that a researcher has experience in the researched problem, so in other words, there is personal involvement. I have just read a Ph.D. dissertation about the work-life balance of female Ph.D. students. The author, a female Ph.D. student, had to be conscious to avoid being biased based on her own experiences, problems, challenges and had to avoid focusing more on the positive or on the negative aspects, just because she experienced them dominantly in her life. This is such an ethical issue that is not specifically coding-related and arises also during the analytical phase of the work.

Be sensitive to the language your respondents use!

This can be a critical success factor for your research project. Language can reflect values and world views precisely. During coding the researcher has to pay special attention to this, otherwise, one can uphold hidden assumptions instead of capturing the empirical reality. In vivo coding can be especially useful, if you want to perform culturally responsive analysis.  

Be specific!

Coding shall not be too general. They shall identify concrete actions and processes, otherwise, it can happen, that you overlook how people construct actions and processes. You can apply action-based coding (process coding) specifically when you want to indicate movement or procedure. Descriptive coding however can be an option, if you want to make extracts by using a short label. Structural coding works the best, if you want to highlight the structural attributes of a dataset (who, what, where, how, when). Values coding on the contrary is about the participants’ worldview, beliefs, and attitudes. Don’t forget, that you will use the codes to analyze, not to summarize!

Avoid using synonyms!

If you see, that two or more codes are too similar, you can merge them into one, or be more specific with the label you use. Otherwise, you will work with a redundant codebook.

In sum, coding is nothing else, but the process of discovery, mainly when it is about grounded theory. The above-mentioned methods and tips are however not strict rules, but flexible guidelines. You can go back any time to change the labels, or even make fresh coding. Moving from concrete reports of events to theoretical insights is an adventure, as Charmaz pointed out. In vivo coding, process coding, descriptive coding, structural coding or value coding are all alternatives, the decision shall be always made based on the research topic and the characteristics of the participants.

The next part of this blog series will be about other ways (template method, editing, and immersion/ crystallization) how you can sift, sort, and synthesize qualitative data. 

Recommended literature:

Charmaz, K. (2006). Constructing grounded theory: A practical guide through qualitative analysis. sage.

Glaser, B. G., & Strauss, A. L. (2017 [1967]). The discovery of grounded theory: Strategies for qualitative research. Routledge.

Strauss, A., & Corbin J. (1990). Basics of qualitative research: Grounded theory procedures and techniques. Newbury Park, CA: Sage.