đó là số tiền không đúng đã được gỡ bỏ thông qua lấy mẫu phân tầng tỷ lệ. Về lý thuyết, chúng ta có thể làm tốt hơn thế này. Nếu phương trình (5,17) đã được giảm thiểu M i = 1, ni = N, nó được tìm thấy rằng số lượng tối ưu để lựa chọn từ tầng thứ i là n * = i NPI i M i = 1 pi | Stratified sampling 91 1 M i E Pi E Ớ2 - Ớ2 i N _i 1 1 M 1 M N E P r N E Pi ới- ớ 2 N i 1 N i 1 Comparing Equations and gives Var 1 M - Var ÍỚPS i N E pi fy - ớ 2 N i 1 which is the amount of variance that has been removed through proportional stratified sampling. In theory we can do better than this. If Equation is minimized subject to M ni N it is found that the optimum number to select from the ith stratum is i 1 Npịơị ni M E 1 Pi i in which case the variance becomes X 1 M Ỡ2 . Var ớopt NIE Pi j N say. However MM E Pi i- 2 E Piaỉ- 2- i 1 i 1 Therefore from Equations and 1M N E Pi i- v N i 1 Now the various components of the variance of the naive estimator can be shown 1M M Var ớ N Pi ei-ớ 2 Pi ơi- V2 N i 1 i 1 The right-hand side of Equation contains the variance removed due to use of the proportional nJ rather than the naive estimator the variance removed due to use of the optimal nJ rather than the proportional nJ and the residual variance respectively. Now imagine that very fine stratification is employed . M to . Then the outcome X e S iJ is replaced by the actual value of X and so from Equation Var N Varx E Y X Ex a2 Y X 92 Variance reduction 6 - 5 4 4 Y 3 2 - 1 - 0 X Figure An example where X is a good stratification but poor control variable The first term on the right-hand side of Equation is the amount of variance removed from the naive estimator using proportional sampling. The second term is the residual variance after doing so. If proportional sampling is used it is often more convenient than optimum sampling which requires estimation of the stratum variances ơf through some pilot runs then we choose a stratification variable that tends to minimize the residual variance or equivalently one that tends to maximize VarX E Y X . Equation shows that with a fine enough proportional stratification all the variation in Y that is due to the variation in E Y X can be .