Randomized Response Procedure for the Estimation of the Population Ratio using Ranked Set Sampling

Corresponding Author: Agustín Santiago Facultad de Matemáticas, Universidad Autónoma de Guerrero, Acapulco, Guerrero, México Email: asantiago@uagro.mx Abstract: In this study we deal with the estimation of the population ratio, when a Randomized Response (RR) procedure is used for collecting responses and Ranked Set Sampling (RSS) is the selection method. The variances of the suggested estimators are calculated. Comparisons between different estimators are presented.


Introduction
The common model considers that we are interested in the study of Y, a sensitive variable evaluated in a finite population U = {u 1 , u 2 ,…,u N }, u i , is an identifiable unit. Some values Y, identifies having a stigma. Hence stigmatized individuals will tend to give incorrect reports on Y or to refusing to give an answer. A solution is introducing the use of a Random Response (RR) query. The seminal work on RR is due to Warner (1965). Warner's method consisted in including two alternatives questions: The question associated with the stigma and other insensitive questions. The interviewed chooses at random one of the questions and gives an answers without revealing which question he/she has selected. In the case of a quantitative variable a similar reasoning can be used. Chaudhuri and Stenger (2006), for example, for a discussion on RRprocedures when we deal with a quantitative character.
RR models are in development due both to their practical and theoretical interest. Give a look to the papers of (Singh and Singh, 1993;Christofides, 2003), for example. Commonly the authors considered the behavior of their proposals when simple random sampling is the design used for selecting the samples. Rueda and González (2004;Singh and Tarray, 2014a;2014b), for a comprehensive look at this problem.
RSS is a relative new sampling design, whichout performs Simple Random Sampling With Replacement (SRSWR). The seminal paper is due to McIntyre; see Chen et al. (2004). The units may be ranked cheaply and then an Order Statistics (OS) is selected from each provisionally selected sample. The provisional samples are selected using SRSWR. It has been proved that RSS generally supports an increase in accuracy of the estimators.
Some interesting recently published results are: Al-Saleh and Al-Omari (2002), who suggested multistage ranked set sampling for estimating the population mean; Bouza (2010) who considered the estimation of the mean of a sensitive quantitative character in RSS using auxiliary variables for RR procedures; Chen and Lim (2011) who considered the estimation of variances of strata in RSS. Patil (2002;Patil et al., 1994;1999;Bouza and Al-Omari, 2010;Al-Omari, 2011;Jemain and Al-Omari, 2006;Chen et al., 2004) for a detailed discussion on RSS.
In this study, we considered the ratio estimation problem. Let X be a known variable highly correlated with Y which is used both for selecting the ranked sample and for computing estimation of the ratio, where µ X and µ Y are the population means of X and Y, respectively.
The remaining part of the paper is organized as follows: In section 2, is concerned with a model based RR responses procedure when is used SRSWR. A RSS with RR procedures is developed in section 3. Comparison between different estimators is conducted in section 4. In section 5, an empirical comparison of the proposed estimators is presented. Conclusions are given in section 6.

A Scrambled Variable RR Procedure under SRSWR
We will describe briefly the RR procedure developed by Chaudhuri and Stenger (1992). It is an illustrating model. For an unit u i ∈ U the sampler determines the sets of known variables A = {A 1 , A 2 ,…,A T } and B = { B 1 , B 2 ,…,B S }.
Once they are fixed, we For each selected a u i ∈ U, he/she will not response to the sensitive question and report the value of Y i . The unit (individual) performs a random experiment and selects independently a ∈ A and b ∈ B, say (A i , B i ). The report made by the questioned is: The model expectation and variance of the "prediction" are: The selection of a sample of size n using simple random sampling without replacement as design generates the reports, R 1 , R 2 ,...,R n . The RR procedure generates the data D( Then the sample mean of the computed , i R s are used for estimating the mean of the sensitive variable: Due to the independence: The variance of the model expectation is: Therefore, the expected error of (2.1) is: is due to using the RR procedure.
Consider the estimation of ζ and take the naïve estimator: A Taylor series expansion for the first order of (2.4) yields the approximation: Its variance is given by:

RSS for the RR Procedure
For implementing the selection of a RSS, we use SRSWR for choosing independently m samples of size m. The units in each sample are ranked using some additional information on Y. Commonly a highly correlated covariate X. Take for example as covariate: • The known salary of the functionaries allows establishing a ranking of variables related with the money obtained by briberies once that the homes of them are visited • The size of the network of people with whom an infected AIDS patient has had sex is known. It is correlated with different interest sensitive variables • The area of a farm is correlated with variables associated with the production of it. Consider the study of the evasion of tax variables. The magnitude of undeclared production, sells and other economic issues is sensitive. Ranking the area permits to derive an adequate ranking of sensitive variables The unit occupying the place i in the ranked sample S (i) is included in the ranked sample. Then a sample of size m is obtained. When we need a sample of size n we apply the procedure independently r ≥ 1 times (cycles). Then we have n = mr sample units. David and Levine (1972) developed a study of the effect of ranking judgmental errors. They proved that the errors do not affect the properties of RSS. We will use this fact in the sequel and we work with judgmental order statistic. Let The mean and the variance of X (i) are given by ( ) respectively. Takahasi and Wakimoto (1968) showed that the efficiency of RSS relative to SRS is: are the sample means using SRS and RSS methods, respectively. Also, they showed that: Bouza (2009;Hussain and Shabbir 2011;Bouza, 2010;Agarwal et al., 2012) for more insights on these issues. We assumed that the ranking is made on Y. For implementing the procedure, each i u interviewed selects randomly and independently The report of the ith ranked sample in the tth cycle is: Then we can compute for each u i : For each cycle we have that: Therefore, we derive easily that an unbiased estimator of µ Y is: The model variance of the report is: The independence of the involved variables sustains that: The relation between [ ] 2 i σ and 2 Y σ allows writing (Chen et al., 2004): Hence, we can rewrite (3.6) as: The other term of the error is: We implement the ranking of the selected individuals using the information provided by the selected auxiliary variable X. The persons included in each sample select randomly the corresponding insensitive variables A and B. We will consider the cases in which A or Bare equal to X. The RSS procedure is used in them independent samples and in each cycle. The report of an individual u i is: Consider the ith interviewed in the tth cycle and take Therefore, to average the reports generates a model unbiased estimation of the mean of Y. Hence: Is an unbiased estimator of µ Y as the reports are model unbiased for the corresponding sensitive variable and the arithmetic mean is design unbiased. Its model variance for the OS of the i-th order in the cycle t is: σ and µ A(i) are the variance and mean of the OS of A (i) . Then, the design expectation of the model error for the ith OS is: The variance of (3.8) is given by: Noting that 2 this relation between the variance of an OS and the population variance permits to rewrite (3.9) as: Note that from (3.10), is clearly indicated the effect of using RSS on the accuracy with respect to SRSWR strategy. Then, the estimator of the ratio of the population is given by: The first order Taylor Series expansion of (3.11) is: With variance: Let us consider that the ranking is made using B. the model report is: The unscrambled variable is: The we estimate unbiasedly µ Y is given using the estimator: Note that these reclus allow managing the accuracy of this RSS strategy by using an adequate value of µ A . Take: Using a Taylor expansion to the first degree of approximation, the estimator in (3.16) will be: With variance given by: The variance terms are given by (2.3), (3.7), (3.9) and (3.15) and:

Comparison of the Different Alternatives
Deriving a measure of the gain in accuracy of the estimators, based on their variance, leads to unmeaningful expressions, for deciding which is the best alternative. These expressions do not allow fixing values of the controllable parameters and establishing which the expected behavior of the sampling errors is. We considered a series of data bases designed Monte Carlo experiments. We planned the experiments for obtaining insight on the behavior of the distance between parameters and their estimations. Simulation experiments were conducted and the performance of the estimators were measured by calculating, for each generated sample:

Population 3
Farmers selling products directly in the market. The ranking variable was reported cultivated area of the farm X total area of the farm = , the sensitive variables were Y 4 = Unreported income derived from selling their products in the last 6 months; Y 5 = Real cultivated area; Y 6 = Income from unauthorized services, N = 52.
Note that in all the cases, X∈[0,1]. We used as sample size n = 3×5 = 15. The distribution of A* and B* were fixed as a Uniform in (0,1), U(0,1); the standard normal, N(0,1); the standard asymptotical normal , AN(0,1). The moments, variances and covariances of the OS´s were computed using the tables developed by Hastings et al. (1947). The OS * , 1,2 , , We compared the estimators by computing: , , 0 Hence, the comparison of the proposed estimators gives that ( )A RSS ζ is to be preferred when A = B. The use of AN(0,1) is the best procedure. We consider that this results are supported by the fact that the means and standard deviations of the involved OS of AN(0,1) are more similar than the parameters of the other two distributions.
The results of the analysis when A and B were generated using the maxima are given in Table 2. They are similar to those of Table 1, but the preferred estimator is ( )B
The use of crossed criteria for generating A and B appear in the next tables. Table  3 presents the results for The discussion on the relationships among the results for the distributions gives rise to similar comments and AN(0,1) has the best behavior. It is remarkable that it is preferable using ( )B

RSS
ζ . In this case: , , 0 The relationships are changed as , , 0

B RSS RSS A RSS RSS
Cov R x Cov R x − < .

Conclusion
From the derived results is obtained that the RSS models are more efficient than using the classic SRSWR estimators. The use of sets of auxiliary variables related with Y (1) or Y (N) allows determining which estimator is to be preferred and the expected gain in accuracy. The best method for generating them is to use AN(0,1). A recommendation to practitioners is to fix the bound to the values of the sensitive variable Y, it is feasible to construct A and B accordingly using U(0,1), N(0,1) or AN(0,1) and to decide which is the more efficient estimator.
final issue is animproved version benefited from the suggestions of them. The research of one of the authors was supported by the PNCB "Modelos Matemáticos para el Estudio de Medio Ambiente, Saludy Desarrollo Humano".

Author's Contributions
Agustín Santiago Moreno: Development of the idea of estimating the proportions using RSS randomized response method. Writing and editing the paper, work overall coordination and integration of ideas.
Carlos N. Bouza Herrera: He contributed estimators developed in previous work and some proofs of theorems. Also he contributed in drafting the final version of the entire document.
José Maclovio Sautto Vallejo: He programmed simulations algorithms, helped in the interpretation of results.
Amer Ibrahim Al-Omari: He contributed demonstrations RSS related estimators, in partial wording of paper, contributed central ideas and writing the paper in English.

Ethics
This article is original and contains unpublished material. The corresponding author confirms that all of the other authors have read and approved the manuscript and no ethical issues involved.