View Complete Reference

Cho, MJ, Eltinge, JL and Swanson, D (2003)

Inferential methods to identify possible interviewer fraud using leading digit preference patterns and design effect matrices

Proceedings of the American Statistical Association.

ISSN/ISBN: Not available at this time. DOI: Not available at this time.



Abstract: ABSTRACT: Interviewer fraud can damage the data quality severely. How can we detect it? Turner et al. (2002) used response patterns to detect falsification. They reported that suspected falsifiers could be noticeable by an unexpectedly high yield of interviews per assigned sample address, and/or unusual response rates for specific reported variables on behaviors. Turner et al. also discussed the systematic differences between suspected falsifiers and other interviewers in providing the verification means, such as telephone numbers of the respondents. Biemer & Stokes (1989) proposed a statistical model for describing dishonest interviewer behavior, which was applied to a general quality control sample design and several associated cost models. A 1982 U.S. Bureau of the Census study indicated a higher degree of cheating in urban areas (Biemer & Stokes). The study also shows a substantial and highly significant tendency for relatively inexperienced interviewers to cheat more frequently for the two largest demographic surveys, the Current Population Survey and the National Crime Survey (Biemer & Stokes). We used the leading digits to detect curbstoning in this paper. The effect of the sampling design, such as stratification and clustering, on standard Pearson chi-squared test statistics for goodness of fit is investigated. Statistical methods for analyzing cross-classified categorical data has been extensively developed under the assumption of multinomial sampling. However, most of the commonly used survey designs involve clustering and stratification and hence the multinomial assumption is violated (Rao & Scott, The views expressed in this paper are those of the authors and do not necessarily reflect the policies of the U.S. Bureau of Labor Statistics. 1981). Literature has shown that clustering can have a substantial effect on the distribution of the standard Pearson chi-squared test statistic, χ2 and that some adjustment to χ2 may be necessary, without which one can get misleading results in practice (Rao & Scott, 1981). Rao & Scott developed a simple correction to χ2 which requires only the knowledge of deffs (or variance estimates) for individual cells in the goodness of fit problem (Rao & Scott, 1981). The original Rao & Scott papers considered inference for one vector of proportions, based on (essentially) one sample. In this paper, we are considering inference for a large number of proportion vectors pi, i = 1, ..., I, where I is the total number of interviewers. Only a small portion of an interviewer’s workload can be verified because of the limited resources. Therefore, we addressed the optimum allocation of resources such as re-interview time using the optimal decision rule


Bibtex:
Not available at this time.


Reference Type: Journal Article

Subject Area(s): Statistics