A BIVARIATE PROBABILITY MODEL TO IDENTIFY “HONESTY” VERSUS “CHEATING” IN ECONOMIC SURVEYS: XENOPHOBIA IS ILLUSTRATED

Making successful policies for manufacturing or glo balized business sectors require “honest” answers b y the respondents in the surveys. When some responden ts cheat in their answers to the questions in a survey for a variety of reasons, the survey results become useless. Currently, there is no appropriate  probability model to address this phenomenon “cheat ing” in a survey. An efficient methodology is needed to estimate the proportion of “cheaters” in a survey and assess the statistical significance of the estimate. Such a methodology does not exist now in the literature. This article fulfils the need. In a  pioneering manner, this article has formulated a bi variate probability model to explain the “cheating” in surveys. Several statistical properties of this new model are identified and explained. For illustrati on of the new probability model, the responses from 259 l ess and 217 more educated Germans about xenophobia are considered. The xenophobia is a stum bling block to a successful business operation in this globalized economy. Two psychometric reasons: “I security level” and “social pressure” behind the  xenophobia are captured with the help of the probab ility model and they are explained. The model of th is article helps to predict “honesty” versus “cheating ” levels in a survey. A statistical testing procedu re is prepared to check whether an estimated “social pres su ” level is significant.


MOTIVATION
When the questions in a survey are about a sensitive topic or one's illegal practice, the respondents are rightfully unwilling to give an answer. Conducting sensitive as much non-sensitive survey is common in social, health, epidemiologic and economic surveys. See Yahya and Adebayo (2013) for a health survey about the breast feeding by mothers. To circumvent such practical difficulties in a survey, Warner (1965) came up with an ingenious and pioneering methodology and named it Randomized Response Technique (RRT). In a RRT, every respondent uses a random device (such as rolling a die) in a private room and selects the sensitive questionnaire-I to answer if the outcome is such and such of a prescription or selects the questionnaire-II to answer if the outcome is outside the prescription. The respondent does not have to reveal the outcome of the random device to anyone and hence, no one will ever know which questionnaire was answered by a particular respondent. This approach offers confidentiality to every respondent and hence, the RRT based survey increases the likelihood of obtaining truthful answers from the respondents.
Since Warner (1965), many articles and books have been written by statisticians, sociologists, psychologists, economists, epidemiologists and marketing researchers among other professionals. Most important articles and books are worth mentioning here in a chronological order. Greenberg et al. (1969) adapted unrelated questions in RRT. Campbell and Joiner (1973) suggested about how to get the answer without being sure you've asked the question. Campbell (1987) popularized among all scientists. Goodstadt and Gruson (1975)

AJEBA
RRT to test the efficacy of a drug. Maceli (1978) demonstrated about how to ask sensitive questions without getting punched in the nose. See Chaudhuri and Mukerjee (1989) for an excellent comprehension of the RRT. Clark and Desharnais (1998) introduced an approach to obtain honest answers to embarrassing questions and to detect cheating in an RRT. Fox (2005) applied RRT in item response theory models of educational studies. Fan and Chaloner (2006) designed a trinomial RRT response to configure an optimal dose level in clinical trials. Guerriero and Sandri (2007) compared randomized response procedures. Lynn (2008) advised on how to deal with the non-response in RRT. Musch et al. (2001) promoted RRT in survey research on the World Wide Web. Ostapczuk et al. (2009a) assessed sensitive attributes using the RRT and the evidence for the importance of response symmetry. Using the wonderful RRT, (Ostapczuk et al., 2009b;Musch et al., 2001) pondered over whether the education has a negative effect in resentful attitudes towards foreigners and this phenomenon is called xenophobia. Interestingly, they first classified the survey outcomes in terms of "honest yes", "honest no" and "cheating" and later provided reasons for xenophobia and questioned whether those with low education exhibited xenophobia more than those with high education among the respondents. Their article was somewhat incomplete only in a sense that an appropriate theoretical framework was missing to perform the statistical significance of the survey based estimate of the proportion of "honest yes", of "honest no" and of "cheating". To add supplementary concepts and tools, this article develops a bivariate probability model and utilizes it to address the "honesty" versus "cheating" in any survey. Using xenophobia data in Ostapczuk et al. (2009b), the contents of this article are explained and interpreted. A few comments and directions are mentioned in the end for future research work to improve survey methodology to capture truthful responses.

A BIVARIATE MODEL FOR "HONESTY" VERSUS "CHEATING" IN SURVEYS
In this section, we develop a bivariate probability model from the basics of the psychometric characters of the respondents in a survey about the xenophobia. A respondent (among a random sample of size n) might be xenophobic because of personal insecurity or social pressure. Recently, Ankudinov and Lebedev (2014) studied the role of insecurity among employees to engage in professional education. These two factors are psychometric characters and they constitute dominant reasons for a respondent to be honest or cheating type in a survey. Neither factor is measurable directly and hence, are treated as parameters in this article.
To be specific, let 0<φ<1 be an unknown probability for a respondent to be "insecure". Independently, a respondent might yield to "social pressure" with a probability 0<ρ<1. It is worth noticing that bothe parameters are open ended in the interval (0, 1) implying that the framework is meaningless if all respondents in the survey are "insecure" or yielding to "social pressure". There are four distinct and mutually exclusive possibilities for any respondent to be. That is, some respondents in a survey might be "insecure cheaters" with a probability 0<φρ<1, might be "insecure honesters" with a probability 0<φ (1-ρ)<1 to answer "yes", might be "secure cheaters" with a probability 0<(1-φ)ρ<1, or might be "secure honesters" to answer "no" with a probability 0<(1-φ)(1-ρ)<1. Every respondent ought to fall in any one of the four mutually exclusive possibilities.
Let X, Y and n-X-Y denote respectively the number of "honest yes", "honest no" and "cheaters" in a survey answered by n respondents. Then, the bona-fide bivariate model for X and Y is Equation 1: The number, X of "honest yes" and the number, Y of "honest no" marginally follow respectively a probability pattern Equation 2 and 3: And: Science Publications

AJEBA
The expected value and variance of X, the number of "honest yes" are respectively nonlinear functions: Likewise, the expected value and variance of Y, the number of "honest no" are respectively nonlinear functions: Furthermore, there exists an intrinsic relation between X and Y. That is: Prompting that X and Y might be correlated. What is their correlation? The statistical dependence among the observed variables remains of vital interest to data analysts see (Nasser, 2007, for details). The correlations have been the basis of the connection between two measurable factors. Recently, Olatayo (2011) used the correlation to establish the similarities and differences between two minerals. Using the Probability Mass Function (PMF) in (1), the correlation between X, the number of "honest yes" and Y, the number of "honest no" is found and it is Equation 4: which is asymptotically near minus one when there is a negligible level of social pressure (that is, ρ→0) and it increases monotonically to (1 ) φ φ − − when the social pressure is an increasing to its full level (that is, ρ→1). Hence, the predictability of "honest yes" based on a known number of "honest no" and vice versa are quite possible. For that, their conditional PMFs are required. First, the conditional PMF of Y, the number of "honest no" answers given X = x, the number of "honest yes" answers is found and it is Equation 5: The conditional mean, E [Y = yX = x] is really the regression of y for a given x with variance, Var [Y = yX = X]. They are derived from the PMF (5) and are expressed as: And: which increases with the number of participants, n in the survey and the slope which increases as the intercept increases.
The conditional variance is recognized as heterogeneity level in the statistics. The randomized response sample survey is therefore quite heterogeneous. The heterogeneity plays a significant role in scientific enquires. Chiarella and He (2005) utilized heterogeneity to comprehend the dynamics among the producers.
Likewise, the conditional probability mass function of X, the number of "honest yes" answers given Y = y, the number of "honest no" is  And: The conditional mean is really the regression of x for a given y with the intercept is which increases with the number of participants, n in the survey and the slope is which increases as the intercept increases.
Most importantly, of interest to those who conducts the survey is the number of "cheaters" in the survey. Recall that X and Y denote respectively the number of respondents who give "honest yes" and "honest no" answers by the n respondents. That means Z = n-X-Y denotes the number of cheaters in the survey. Then, the PMF of Z is Equation 7: The expected number and variance of the "cheaters" are respectively µ Z = nρ and 2 (1 ) . In other words, the survey becomes homogeneous when the number of expected cheaters is higher or lower. The predictability of "honesty" is quite connected to the level of "cheating" in a survey. That is, the conditional PMF of X, the number of "honest yes" and Y, the number of "honest no" for a given level of "cheating" in a survey is How great is the survey? A survey is great if the number of "honest yes" answers is more than the number of "cheaters". For this purpose, the conditional PMF of the total, T = X+Y number of "honest" answers in a survey for a given level of the number, Z = z of "cheaters" is needed and it is Equation 9: The conditional mean of the PMF (9) is And they mean the following. For a given z, the number of "cheaters" among the n respondents in a survey, the expected total number, t of the "honest" answers increases more heterogeneously when the probability, (1-ρ) for any respondent not yielding to "social pressure" increases. The probability, (1-ρ) for any respondent to have not yielded to social pressure to xenophobia is the slope of the downward regression line of X + Y = t on Z = z.
How is its converse? The conditional PMF of Z, the number of "cheating" answers in a survey for a given level of the total number, X + Y = 1 of "honest" answers is Equation 10: The conditional mean of the PMF (9) is µ z/t = E[Z = zX + Y = t] = nρ-ρt with the conditional variance: They mean the following. For a given total number X + Y = t, of "honest" answers by the n respondents in a survey, the expected number, z of the "cheating" answers increases more heterogeneously when the odds, ρ/(1-ρ) for any respondent to yield to "social pressure" increases. The conditionally predictable number of "cheaters" in a survey proportionally decreases at the rate of probability, ρ for any responded to yield to the "social pressure".
Next, we need to estimate the parameters: φ, ρ to be useful in practice. The Maximum Likelihood Estimators (MLE) are preferable over others as the MLE are most efficient and optimal see (Kendall et al., 1994). For this purpose of finding the MLE, consider a random sample (x i , y i , z i ), I = 1,2,…, nof size n≥2 as a draw from the PMF (9). Then, the log likelihood is Equation 11: Differentiating separately with respect to φ and ρ equating to zero and solving them, their Maximum Likelihood Estimator (MLE) are obtained. They are Equation 12: Which is the ratio of the number of "honest yes" to the total "honest answers" and Equation 13: Where n denotes the number of "respondents" in the survey. The surveyors often wonder whether a sample estimate is statistically significant. To answer the question, a hypothesis testing procedure needs to be developed. The likelihood ratio test is most powerful and invariant (that is, the MLE of a function is simply the function of the MLE). Hence, the likelihood ratio is adapted here. Kendall et al. (1994) for details about the likelihood ratio concept and tools.
To be specific, suppose that the surveyors wonders whether an estimate ρ (13) in a data is significant or negligible? If it is negligible, its p-value ought to be large. For the purpose of finding p-value, the covariance of the estimates (12) and (13) is zero. The variance-covariance matrix of the MLE (12) and (13) The information matrix is a diagonal matrix. The variance-covariance matrix ˆˆv ar( ) cov( , )ˆĉ ov( , ) var( ) is the inverse matrix SI −1 1. Because the information matrix is diagonal, the covariance matrix is also diagonal with inverted elements. That means we can use the statistic 2ˆ( which follows the standard normal distribution. Hence, the p-value of ρ is: The statistical power accepting the alternative hypothesis 1 1 H : ρ = ρ where 1ρ ≠ ρ is:

ILLUSTRATION
In this section, the results in the previous section are illustrated using the survey data about xenophobia in (Ostapczuk et al., 2009a;Musch et al., 2001). There were two groups. In the group-1 with low education, a random sample of n 1 = 259 Germans and in the group-2 with high education, a random sample of n 1 = 217 Germans were asked whether they hated foreigners. The numbers for X, Y and Z are displayed in Table 1. Using the MLE (12) and (13) respectively, the estimate of insecurity level and the level of yielding to the social pressure are found for each group and displayed in the Table 1 Other results are also displayed in Table 1 and are interpreted below.
Notice that the estimated probability of being insecure is 0.61 in group-1 with low education and 0.36 in group-2 with high education. The estimated proportion yielding to social pressure is 0.37 in group-1 with low education while it is only 0.17 in the group-2 with high education. Indeed, the education reduces xenophobia. The estimated probability of yielding to "social pressure" is significant in both groups. The statistical power of accepting the true statement that the proportion of yielding to social pressure is 0.40 is 0.946 in group-1 and 0.899 in group-2. Such high power confirms that the methodology of this article is superior.

AJEBA
The expected number of "honest yes" "honest no" and "cheaters" are close to the observed counter-parts. The correlation between the number of "honest yes" and the number of "honest no" is estimated to be -0.71 in group-1 with low education and -0.83 in group-2 with high education. The education has impact.
For an increase in "honest yes", the expected decrease in the "honest no" is 12.28 in group-1 with low education but is only 1.13 in group-2 with high education. Again, it confirms that the education makes a difference in the reduction of "honest no".
The converse is more robust. That is, for an increase in "honest no", the expected decrease in the "honest yes" is 0.63 in group-1 with low education but is only 0.83 in group-2 with high education. The education makes a moderate difference in the reduction of "honest yes". Furthermore, the expected decrease in the total number of "honest" persons is 0.63 in group-1 but 0.83 in group-2, when the number of "cheaters" increase by one. Finally, for an increase of one more total "honest" persons, the expected decrease in the number of cheaters is 0.37 in group-1 with low education but is only 0.17 in group-2 with high education and it confirms that there is an impact of education.

FUTURE RESEARCH DIRECTIONS
This article is first of its kind to suggest the underlying probability structure foe the "cheating" responses in a survey. It is customary for the nonmeasurable characteristics of the respondents of a survey are treated as parameters. Accordingly, the nonmeasurable reasons: "Insecurity level" and "yielding to social pressure" in a survey about xenophobia are treated as parameters. In Bayesian approach, the dynamics of the parameters are tracked and explained with the application of loss function and optimal criteria. The prior and posterior distributions of the parameters need to be worked out. The Bayesian estimates are done differently from the classical (that is, frequentist) approach. This article has explored only the frequentist approach but not the Bayesian approach of the "cheating" versus "honesty" in the answers of a survey. A future research work is needed and hence, is recommended to construct the Bayesian approach in this topic.

LIMITATIONS OF OUR METHODOLOGY
The bivariate probability model, estimators of the model's parameters and the validity of the hypothesis testing are limited to the availability of a random sample. In surveys, commonly practiced data collection methods are systematic sampling, cluster sampling, or snow sampling. The contents of this article are unsuitable for non-random sample which are collected using systematic sampling, cluster sampling, or snow sampling in a survey.

CONCLUSION
The education, as a covariate, is noticed to make a significant difference not only with respect to xenophobia but also the number of "honest yes", "honest no" and "cheating" in the survey. Then, there could be many other covariates. Such covariates might not be orthogonal to each other but might be collinear.
Currently, there is no methodology to sort them out. There is a need to develop such needed methodology. There are plenty of health and medical data out there to benefit from those yet to be developed methodologies. These methodologies have scope beyond the health in engineering, commerce and economics among many others.