Research usually starts by the formulation of theories, based on previous observations. From these theories, hypotheses are derived. Then, these hypotheses are tested. The test determines if they should be accepted or rejected. In some cases, the hypotheses can be tested by conducting an experiment. In others, observational or non-experimental designs are used.
The formulation of theories and hypotheses is done at the level of concepts. These concepts are mental representations, i.e. entities that exist in the brain but are not directly observable. In order to test the hypotheses, it is necessary to operationalize these concepts by specifying empirical indicators or measures for each of them. In observational designs, the measures are often survey questions.
However, there is no measurement without measurement errors. This holds for survey questions too: survey questions never measure the concepts of interest perfectly. Thus, a good operationalization should select the question formulation that maximizes the strength of the relationship between the latent variable of interest (or concept) and the observed answer to the question (also called indicator). In other words, a good operationalization should select the question formulation which minimizes the size of the measurement errors, or maximizes their complement, the measurement quality (Revilla, Zavala Rojas and Saris, 2016). This quality, which can be computed as the product of measurement validity and reliability, takes values from 0 to 1. The closer to 1, the better, since this means less measurement errors. Researchers should try to get as close as possible to this ideal situation.
Nevertheless, even if the questions are designed very carefully, some measurement errors will always be there. This is an issue because measurement errors may affect a lot the results of a research (Saris and Gallhofer, 2014). In cross cultural research, for instance, observed differences across countries may respond not to real differences, but to the different sizes of measurement errors that occur due to asking the same question in different languages or cultures.
Therefore, correcting for these measurement errors is crucial (Saris and Revilla, 2016). However, one needs information about the size of the errors, or the measurement quality, in order to do this correction (see DeCastellarnau and Saris, 2014, for details about how to do it).
For the USA and Europe, a lot is already known about the quality of questions depending on the scale characteristics (e.g. Alwin, 2007; Saris and Gallhofer, 2014). However, in other parts of the world, this was not studied yet. In addition, not so much is known yet about the quality for web surveys, in particular in countries where the internet penetration is not so high.
Therefore, we implemented an experiment in order to estimate the quality of different questions in the opt-in online panel Netquest in Mexico and Colombia, where the internet user penetration was still low at the time when the fieldwork was carried out, in January–April 2013 (53% for Colombia and 44.9% for Mexico).1 The questionnaire replicated most of the European Social Survey (ESS) core modules of round 4. In our experiment, we focused on 27 questions about satisfaction with the economy, the government, and the democracy, social trust, and trust in different institutions.
In order to estimate their measurement quality, we used a multitrait-multimethod approach (MTMM). The MTMM approach, introduced first by Campbell and Fiske (1959), consists in repeating several correlated questions (each measuring one latent variable called "trait") using different methods. In our experiment, the same respondents got twice the same set of questions, but asked at the beginning of the survey using one scale (e.g. 11 points scale completely in/satisfied) and at the end of the survey using a different scale (e.g. 5-point scale agree completely-disagree completely). We estimated true score MTMM models as proposed by Saris and Andrews (1991) separately for each country and topic.
The main results from these analyses are the following. First, we found that the quality estimates vary between .41 and .98 depending on the question, scale and country. Thus, the size of the measurement errors really depends on the exact question and scale formulation, meaning that it is crucial to pay special attention to the preparation of each question.
In particular, we found several aspects which were related with differences in quality. In both countries, agree-disagree scales lead to a lower quality than scales directly referring to the latent dimension of interest (e.g. in/satisfied). In addition, the use of fixed reference points for the two end points of the scale (i.e. the use of labels which indicate clearly that the end points of the scale are also the end of the latent dimension, for instance "completely" or "totally" satisfied/agree) increased the quality. Finally, the increase in the number of responses categories (till 11) positively affected, in general, the quality for the scales directly referring to the latent dimension of interest.
Thus, these first results about quality for central and Latin American countries show: 1) that the quality estimates are quite similar in our analyses to what has been found in the USA or Europe (for example, compared with results in Saris and Gallhofer 2014), and that 2) the relationships between the quality and the scale characteristics (e.g. agree-disagree scale or not, use of fixed-reference points, number of response categories) are also in line with previous research findings based on the US and Europe. Therefore, similar practical recommendations can be made: for instance, avoid to use agree-disagree scales, or use fixed-reference points at the end of your scales whenever possible. For further details about the study and results, we refer to Revilla and Ochoa (2015).
1 Internet user penetration according to Statistica website, for 2013: see https://www.statista.com/statistics/379965/columbia-internet-user-penetr... and https://www.statista.com/statistics/379973/mexico-internet-user-penetrat...
Alwin, D.F. (2007). Margins of Errors: A Study of Reliability in Survey Measurement. Wiley and Sons, Inc, Hoboken, New Jersey.
Campbell, D.T. and D.W. Fiske (1959). ‘‘Convergent and Discriminant Validation by the Multitrait-Multimethod Matrix.’’ Psychological Bulletin 6:81-105
DeCastellarnau, A., and Saris, W. E. (2014). A simple way to correct for measurement errors in survey research [online]. ESS Edunet module, retrieved from http://essedunet.nsd.uib.no/cms/topics/measurement/
Revilla, M., and C. Ochoa (2015). “Quality of Different Scales in an Online Survey in Mexico and Colombia”. Journal of Politics in Latin America, 7(3): 157–177. Available at: http://journals.sub.uni-hamburg.de/giga/jpla/article/view/903/910.
Revilla, M., Zavala Rojas, D., and W.E. Saris (2016). “Creating a good question: How to use cumulative experience”. In Christof Wolf, Dominique Joye, Tom W. Smith and Yang‐Chih Fu (editors), The SAGE‐Handbook of Survey Methodology. SAGE. Chapter 17, pp.236-254.
Saris, W.E., and F.M. Andrews (1991). “Evaluation of Measurement Instruments using a Structural Modeling Approach.” Measurement Errors in Surveys 575-599.
Saris, W. E., & Gallhofer, I. N. (2014). Design, Evaluation, and Analysis of Questionnaires for Survey Research (Second). Hoboken, NJ: John Wiley and Sons, Inc.
Saris, W.E., and M. Revilla (2016). “Correction for measurement errors in survey research: necessary and possible”. Social Indicators Research, 127(3): 1005-1020. First published online: 17 June 2015. DOI: 10.1007/s11205-015-1002-x Available at: http://link.springer.com/article/10.1007/s11205-015-1002-x