IBUC 2006 Abstract

Computer Assisted Coding by Interviewers
Wim Hacking, John Michiels & Saskia Janssen-Jansen; Statistics Netherlands, The Netherlands

Coding activities play an important role in a statistical production process. These activities are usually associated
with the assignment of responses from respondents to predefined codes in a classification, so that these responses
become available for further data-editing operations. Coding can be done at various stages of statistical production:
during data collection by respondents or interviewers or during the data-editing process by coding experts and/or
automated systems. Examples of more difficult coding activities are the assignment of a respondent’s educational
background or occupation to a corresponding classification. In these examples usually more than one (open or
closed) question is involved in gathering the information required to assign valid codes.

In previous years the approach adopted at Statistics Netherlands for coding (open text) responses from a number
of related questions was the following. First, the answers to these questions are collected using CAPI or CATI
modes of data collection. At the statistical office the collected and sometimes edited information is then fed to a
‘batch’ process for automated coding. The records that are not coded are classified interactively by coding experts
in a second pass. The batch process can involve any technique of fully automated coding, but at Statistics
Netherlands it relied on ‘handmade’ dictionaries. In this process open text answers are edited to remove
misspellings, special signs and certain text strings. Then these answers are compared to words and word
combinations in a dictionary. In the case that the answers match with text strings in the dictionary a unique
classification is possible. Only a small percentage of cases (10-20%) were successfully coded in the ‘batch’
process.

We will discuss an alternative coding technique that has been developed at Statistics Netherlands: Computer
Assisted Coding by Interviewers (CACI). The new technique is more cost effective than traditional approaches with
comparable level of detail and reliability. Two different computer-assisted coding techniques have been
implemented and are currently used for three different classifications: education, occupation and economic activity
of businesses. The first strategy is based on an approach described in [1]; the other one is completely new
developed, to our knowledge. The number of entities coded is 95% for economic activity of businesses, 75% for
occupations and 82% for education. The quality of these codes can be expressed as the percentage correctly
coded: these are 93%, 90% en 87% for economic activity of businesses, occupation and education, respectively.

Contact: whcg@cbs.nl