Modification on PPS Sample Scheme with Replacement

Corresponding Author: Ayed R.A. Alanzi Department of Mathematics, College of Science and Human studies at Hotat Sudair, Majmaah University, Majmaah 11952, Saudi Arabia Email: a.alanzi@mu.edu.sa, auid403@hotmail.com Abstract: In this paper we have developed an alternative estimator for the Probability Proportional to Size (PPS) with replacement sampling scheme when certain characteristics under study are positively correlated with the selection probability. An analogue to the well-known superpopulation model for finite population is also suggested, using which, we compare the proposed estimator with Hansen and Hurwitz estimator. Finally, an empirical investigation of the performance of the propose estimator has also been made.


Introduction
Probability Proportional to Size (PPS) sampling is a method of sampling from finite population in which a size measure is available for each population unit before sampling and where the probability of selecting a unit is proportional to size.
Consider a finite population U = (U 1 ,U 2 ,…,U N ) consisting of N distinct and identifiable units. Let Y i be the value of the study variable ‫ݕ‬ on the unit U݅, ݅ = 1,…, N. In practice we wish to estimate the population total Y = Σy i from the ‫ݕ‬ values of the units drawn in a sample (u 1 , u 2 ,…, u n ) with maximum precision. The easiest of the probability sampling scheme for drawing a sample ‫ݑ‬ is the Simple Random Sampling with Replacement (SRSWR) scheme for which an unbiased estimator of y is given by: With variance is given by: Hansen and Hurwitz (1943) proposed the idea of sampling with Probability Proportional to Size (PPS) with replacement for positive correlated characteristics. This scheme was carried out as follows: One unit is selected at each of the n draws. For each ݅‫ݐ‬h unit selected from the population, a selection probability is given by: where, x i is the measure for ith unit and: They gave the following estimator of population total Yi as: With variance is given by: PPS sampling is expected to be more efficient than SRS sampling if the regression line of y on x passes through the origin. When it is not so, a transformation on the auxiliary variable can be made so that the PPS sampling with modified sizes becomes more efficient. Reddy and Rao (1977) suggested that the sample be selected by probability proportional to revised sizes scheme and with replacement, the revised sizes are obtained through a location shift in the auxiliary variable as: However, only one measure of size is usually used in selecting primary sampling units in PPS scheme. In contrast, it may sometimes happen that some of these study variables are poorly but positively correlated with selecting probabilities, thereby reducing the existing estimator inadequate. An alternative estimator was proposed by Rao (1966). Bansal and Singh (1985), Amahia et al. (1989), Enang and Amahia (2012) and others have proposed an estimator for characteristics that are poorly correlated with selecting probabilities. Sahoo et al. (1994) suggested a simple transformation of the auxiliary variable where the correlation between study variable and auxiliary variable is highly negative. Bedi and Rao (1997) gave a new direction in determining estimator of population total under the PPSWR sampling scheme when the correlation between the auxiliary variable and study variable is negative.
In this paper we suggested a simple transformation on x to x * such that ‫ݔ‬ * = (x+x i ).
We have also obtained the condition under which the proposed estimator will be more efficient than Hansen and Hurwitz (1943) estimator. The condition has been derived under the superpopulation model given below.

The Superpopulation Model
Let y i and ‫݅‬ denote the value of characteristics ‫ݕ‬ and the relative measure of size p for the ݅th (i = 1,2,…,N) unit in the population, respectively. A general superpopulation model suitable for our case is: where, e i are the errors such that: where, E(.) denote the average overall finite population that can be drawn from the super population. There are many papers in which the supper population model is successfully used for the purpose of comparing the different sample strategies, see, Godambe (1955), Brewer (1963, Rao (1966), Hanurav (1976) and many others. PPS sampling is considered to be more efficient than SRS sampling if the regression line of y on x passes through the origin Raj (1954). When it is not so a transformation on the auxiliary variable can be made so that the PPS sampling with modified sizes become more precise.

Suggested Estimator
Suppose that the auxiliary variable x>0 has a positive correlation with study variable y. Then we suggest the following transformation on x to x * such that x * = (x+ x i ),i = 1,2,…,N. Naturally x * is greater than zero. Further, we can easily see that correlation between y and x * is also positive. Hence the modified probabilities of selection become: Then the estimator of the population total Y is give by:

The Variance and its Expected Value of the Suggested Estimator
It is well known that the variance of the usual estimator HH T ⌢ is given by: The corresponding variance of the estimator due to Rao is obtained by: The variance of proposed estimator is obtain by replacing p i by * i p in (7) and is given by:

Robustness Estimator
Now, we state two lemmas, which are useful for estimator's comparisons.

Lemma 1
Royall (1970) The difference between them can be written as: where, c i = (Np i −1) and . Note that, the above first term of the above expression is always positive. For the second term we observe that and c i is an increasing function of_i. So in view Royall's lemma 1 it can be shown that increasing function of p i . By deriving b i with respect to p i we get that the sufficient condition that makes T HH has smaller variance than P Y ⌢ is: Hence the theorem is proved.

Empirical Study
To study the behavior of the estimator P Y ⌢ with the conventional estimator HH T ⌢ , we consider the five population A,B,C,D and E, details of which are given in Table 1. The population A,B and C are the same as the three population of the Yates and Grundy (1953).     Whereas population D is of Stuart (1986). The population E is of Stuart (1986) and population F is of Amahia et al. (1989).  Table 3 give the percentage efficiency of the proposed estimators p