Research Article: 2022 Vol: 25 Issue: 4

**Deepjan Gohain, North Eastern Regional Institute of Science and Technology**

**Krishnajyoti Nath, North Eastern Regional Institute of Science and Technology**

**Singh BK, North Eastern Regional Institute of Science and Technology**

**Citation Information**: Gohain, D., Nath, K., & Singh, B.K. (2022). Estimation of population mean using ratio type imputation technique with linear combination of two auxiliary variable under two-phase sampling. *Journal of Management Information and Decision Sciences, 25*(S5), 1-20.

Present paper proposes four generalized classes of estimators for estimating population mean under the framework of two-phase sampling design by using auxiliary information and also the expressions for bias and mean square error are derived. These types of imputation techniques are used in different decision science related fields for better results. In addition, theoretical results showing the superiority of the proposed estimator over existing estimators from empirical studies based on different datasets from classical statistical literature are shown.

Imputation; Bias; Mean Square Error (MSE); Missing Data; Large Sample Approximation; Simple Random Sampling without Replacement (SRSWOR).

The sampling unit refuses to participate in the sample survey, cannot respond, cannot be contacted, or accidentally loses some of the information collected due to unexpected factors, resulting in incomplete survey responses. To deal with missing data effectively Kalton et al. (1981) and Sande (1979) suggested imputation methods that make an incomplete data set structurally complete and its analysis simple. Hyunshik Lee & Särndal (1994); and Lee et al. (1995) used the information on an auxiliary variable for the purpose of imputation. Later Singh and Horn (2000) introduced a compromised method of imputation based on auxiliary variables. Ahmed et al. (2006) discussed several new imputation based estimators that used the information on an auxiliary variate and compared their performance with the mean method of imputation.

Singh and Horn (2000); Wright & Capps (2011), Singh & Gogoi (2017); Singh & Nath (2018b; 2019) and Joyce et al. (2021) discussed designing mixed sampling plan based on IPD and some imputation methods of missing data for estimating the population mean using two-phase sampling scheme.

The objective of the present research work is to provide more efficient alternative estimators than the existing ones, when population parameter of auxiliary information is missing or unknown.

Let be a finite population of size N and Y is the study variable and X, Z are the auxiliary variable where Y, X and Z are the population mean of the variable Y, X and Z respectively.

Consider a first phase sample of size drawn from the population by using SRSWOR method and a second sample S of size drawn from or S_{1}

Case-I: when second sample S is drawn from S_{1} i.e. second sample S is depends on first sample S_{1} (denoted by design I) as in Figure 1.

Case-II: when second sample S is drawn from i.e. second sample S is independent of first sample S_{1} (denoted by design II) as in Figure 1.

Let the second sample S contains responding units forming a sub space R and non-responding units with sub space R^{c} , such that For every unit is observed available. For every unit values are missing and imputed values are observed available. For every unit values are missing and imputed values are computed. The of auxiliary variables are used as a source of imputation for missing data when assuming that in S and S_{1} the data are known.

population correlation coefficient between X and Y , Y and Z & Z and X respectively.

the coefficient of variation of X, Y and Z respectively.

Now using the concept of two-phase sampling and denoting E_{1}and E_{2} as the expectation over first phase and second phase respectively we have the following expected values.

Case I: when S is drawn from S_{1}

Similarly,

Similarly,

Similarly,

Case II: when S is drawn from

Similarly

Similarly,

Similarly,

**Mean Method of Imputation**

Under Mean method of imputation

Using above the point estimator of population mean

The bias and Variance are given by

**Ratio Method of Imputation (Hyunshik Lee & Särndal, 1994)**

Under Ratio Method of Imputation

Using above the point estimator of population mean Y is

The bias and MSE are given by

**Compromised Method of Imputation (Singh & Horn, 2000)**

Under this method of imputation

Using the above the point estimator of population mean Y is

Where β is a constant to be determined such that MSE of y_{comp} is minimum.

The optimum Bias and MSE are-

**Exponential Ratio Method of Imputation in two-phase sampling (Pandey et al., 2015)**

Under this method of imputation

Using above the point estimator of population mean Y is

Where α is a drawn constant such that MSE of is minimum.

The optimum Bias and MSE are given by

Dual to Ratio Method of Imputation in two-phase sampling (Singh & Nath, 2018a)

Under this method of imputation

Using above the point estimator of population mean is

Where β is a drawn constant such that MSE of is minimum.

The optimum Bias and MSE are given by

Motivating the above imputation methods of population mean, we have proposed the following Multivariate Ratio type imputation methods of population mean in two-phase sampling.

**Imputation Method**

The imputation scheme is as follows:

**Imputation Method**

The imputation scheme is as follows

**Imputation Method**

The imputation scheme is as follows

**Imputation Method**

The imputation scheme is as follows

Point estimators for population mean under the proposed four types of imputation methods can easily be deduced. We have the point estimators-

In general the above four imputation method can be defined as

The imputation scheme is a follows

Point estimator for population mean

Where α_{1} *and* α_{2} are suitable chosen constants to be determined such that MSE of the point estimator has minimum and

Expanding in terms of retaining the terms upto first order approximate we have

Where,

The bias, MSE and min MSE of the proposed point estimators have been derived in the following theorems.

**Theorem 1**

Bias of the estimators under design I and design II upto first order of approximation are as:

**Proof**: Taking expectation on both sides of equation (1) we have

Putting the expected values under design I we have

Putting the expected values under design II we have

**Theorem 2**

MSE of the estimators under design I and design II upto first order approximation are as-

**Proof**: Taking expectation after squaring the both sides of (3) we have

Putting the expected values under design *I* we have

Putting the expected values under design *II* we have

The optimum value of is obtained by minimizing given in equation (6) and (7) by using the method of maxima and minima we have-

Putting the optimum values of under the design I and design II in equation (2) and (3) and solving for we have

Putting the optimum values of under design I and II in equation (4) & (5) we have

**Theorem 3**

The estimator is unbiased for optimum values of under design I and II.

**Proof**: Putting the optimum values of under design I in equation (6) we have

Similarly put ting the optimum values of under design II in equation (7) we have

In this section we divide the conditions under which the suggested estimator is superior to the existing estimators in design I and design II. To compare the different estimators we use the following theorem of multiple correlation coefficients.

**Comparison with Mean Method of Imputation**

is always efficient than in design I and design II.

**Comparison with Ratio Method of Imputation**

is always efficient than in design I and design II.

**Comparison with Compromised Method of Imputation**

**Comparison with Exponential Ratio Method of Imputation**

**Comparison with dual to Ratio Method of Imputation**

is always efficient than in design I and design II.

To examine the performance of the proposed estimator of the population mean in twophase sampling, we have considered the following three populations (**Tables 1-10**).

Table 1 Mse of The Different Estimators Under Design I |
|||

Point Estimator |
Population I |
Population II |
Population III |

2.026693 | 2559.906609 | 10.369737 | |

1.875253 | 1631.420886 | .955755 | |

1.757726 | 1630.937426 | 8.752709 | |

1.324985 | 1084.832174 | 7.845193 | |

1.152552 | 448.613009 | 6.731424 | |

1.090778 | 446.556316 | 6.273775 |

Table 2 Mse of The Different Estimators Under Design II |
|||

Point Estimator |
Population I |
Population II |
Population III |

2.026693 | 2559.906609 | 10.369737 | |

1.875253 | 1631.420886 | 8.955755 | |

1.757726 | 1630.937426 | 8.752709 | |

1.117529 | 828.239053 | 6.4115601 | |

1.117496 | 405.529871 | 6.356754 | |

1.053244 | 403.431209 | 5.851978 |

Table 3 Pre of The Different Estimators With Respect To Under Design I |
|||

Point Estimator |
Population I |
Population II |
Population III |

100.000 | 100.000 | 100.000 | |

106.686 | 100.029 | 102.319 | |

141.530 | 150.385 | 114.156 | |

162.704 | 363.659 | 133.044 | |

171.918 | 365.334 | 142.749 |

Table 4 Pre of The Different Estimators With Respect To Under Design Ii |
|||

Point Estimator |
Population I |
Population II |
Population III |

100.000 | 100.000 | 100.000 | |

106.686 | 100.029 | 102.319 | |

167.803 | 196.975 | 139.681 | |

167.808 | 402.294 | 140.886 | |

178.045 | 404.386 | 153.038 |

Table 5 Pre of The Different Estimators With Respect To Under Design I |
|||

Point Estimator |
Population I |
Population II |
Population III |

100.000 | 100.000 | 100.000 | |

132.660 | 150.340 | 111.568 | |

152.507 | 363.551 | 130.027 | |

161.144 | 365.225 | 139.513 |

Table 6 Pre of The Different Estimators With Respect To Under Design Ii |
|||

Point Estimator |
Population I |
Population II |
Population III |

100.000 | 100.000 | 100.000 | |

157.287 | 196.916 | 136.514 | |

157.292 | 402.174 | 137.691 | |

166.887 | 404.267 | 149.568 |

Table 7 Pre of The Different Estimators With Respect To Under Design I |
|||

Point Estimator |
Population I |
Population II |
Population III |

100.000 | 100.000 | 100.000 | |

114.961 | 241.819 | 116.546 | |

121.472 | 242.933 | 125.047 |

Table 8 Pre of The Different Estimators With Respect To Under Design Ii |
|||

Point Estimator |
Population I |
Population II |
Population III |

100.000 | 100.000 | 100.000 | |

100.003 | 204.236 | 100.086 | |

106.103 | 205.299 | 109.562 |

Table 9 Pre of The Different Estimators With Respect To Under Design I |
|||

Point Estimator |
Population I |
Population II |
Population III |

100.000 | 100.000 | 100.000 | |

105.666 | 100.460 | 107.295 |

Table 10 Pre of The Different Estimators With Respect To Under Design Ii |
|||

Point Estimator |
Population I |
Population II |
Population III |

100.000 | 100.000 | 100.000 | |

106.100 | 100.495 | 108.626 |

**Population I (Cochran, 1977)**

Y : Number of placebo children

X : Number of paralytic polio cases in the placebo group

Z :Number of paralytic polio cases in the ‘not inoculated group

**Population II (Murthy, 1967)**

Y : Area under wheat in 1964

X : Area under wheat in 1963

Z :Cultivated area in 1961

**Population III (Anderson, 2003)**

Y : Head length of second son

X : Head length of first son

Z :Head breathe of first son

From the above tables, it is obvious that the suggested have smaller Mean Square Error (MSE) than the MSE’s of the other existing estimators both theoretically as well as empirically under dependent and independent cases. Also the Bias of the proposed estimator vanishes at the optimum values of α^{1} and α^{2} . . Therefore it is concluded that the proposed estimator is preferable to use over other existing estimator.

Ahmed, M.S., Al-Titi, O., Al-Rawi, Z., & Abu-Dayyeh, W. (2006). Estimation of a population mean using different imputation methods. *Statistics in Transition**, 7*(6), 1247-1264.

Anderson, T.W. (2003). An Introduction to Multivariate Statistical Analysis, 3rd edR John Wiley and Sons. *New York*.

Cochran, W.G. (1977). *Sampling techniques*. John Wiley & Sons.

Hyunshik Lee, E.R., & Särndal, C.E. (1994). Experiments with variance estimation from survey data with imputed values. *Journal of official Statistics*, *10*(3), 231-243.

Joyce, V.J., Merlin, G.S., Edna, K.R.J., & Fenella, S. (2021). Designing Mixed Sampling Plan Based on IPD. *Journal of Management Information and Decision Sciences*, *24*, 1-6.

Kalton, G., Kasprzyk, D., & Santos, R. (1981). Issues of nonresponse and imputation in the survey of income and program participation. In *Current topics in survey sampling* (pp. 455-480). Academic Press.

__Indexed at__, __Google Scholar__, __Cross Ref__

Lee, H., Rancourt, E., & Sarndal, C.E. (1995). Variance estimation in the presence of imputed data for the generalized estimation system. *Proc. of the American Statist. Assoc.(Social Survey Research Methods Section)*, 384-389.

Murthy, M.N. (1967). *Sampling: Theory and methods*. Statistical Pub. Society.

Pandey, R., Thakur, N.S., & Yadav, K. (2015). Estimation of population mean using exponential ratio type imputation method under survey non-response. *Journal of the Indian Statistical Association*, *53*(1), 89-107.

Sande, I.G. (1979). A personal view of hot deck approach to automatic edit and imputation. *Journal Imputation Procedures. Survey Methodology*, *5*, 238-246.

Singh, B.K. & Nath K. (2018a). Estimation of population mean using ratio cum product compromised method of imputation in two-phase sampling scheme in sample survey. *Asian Journal of Mathematics & Statistics, 11*(1), 27-39.

Singh, B.K. & Nath K. (2019). Generalized class of dual to product cum dual to ratio estimator for population mean with imputation of missing data in two-phase sampling scheme. *International Journal of Mathematics and Statistics (IJMS), 21,* 86-96*.*

Singh, B.K., & Gogoi, U. (2017). Estimation of population mean using exponential dual to ratio type compromised imputation for missing data in survey sampling. *J Stat Appl Pro*, *3*, 515-522.

__Indexed at__, __Google Scholar__, __Cross Ref__

Singh, B.K., & Nath, K. (2018b). Some Imputation Methods in Two-Phase Sampling Scheme for Estimation of Population Mean. *Research & Reviews: Journal of Statistics (RRJoST), 7*(1), 1-16.

Singh, S., & Horn, S. (2000). Compromised imputation in survey sampling. *Metrika*, *51*(3), 267-276.

__Indexed at__, __Google Scholar__, __Cross Ref__

Wright, K., & Capps, C. (2011). A survey of information systems development project performance. *Academy of Information and Management Sciences Journal*, *14*(1), 87-105.

**Received: **10-Feb-2022, Manuscript No. JMIDS-22-11218; **Editor assigned: **15-Feb-2022, PreQC No. JMIDS-22-11218(PQ); **Reviewed: **07-Mar-2022, QC No. JMIDS-22-11218; **Revised: **29-Mar-2022, Manuscript No. JMIDS-22-11218 (R)**; Published: **05-Apr-2022