Additional Single-Equation Topics Estimation with Generated Regressors and Instruments OLS with Generated Regressors We often need to draw on results for OLS estimation when one or more of the regressors have been estimated from a first-stage procedure. | Additional Single-Equation Topics Estimation with Generated Regressors and Instruments OLS with Generated Regressors We often need to draw on results for OLS estimation when one or more of the regressors have been estimated from a first-stage procedure. To illustrate the issues consider the model y bo bixi f------d fiKxK gq u We observe x1 . xK but q is unobserved. However suppose that q is related to observable data through the function q f w d where f is a known function and w is a vector of observed variables but the vector of parameters d is unknown which is why q is not observed . Often but not always q will be a linear function of w and d. Suppose that we can consistently estimate d and let d be the estimator. For each observation i q f w d effectively estimates qt. Pagan 1984 calls q a generated regressor. It seems reasonable that replacing qi with qi in running the OLS regression yt on 1 Xu xt 2 . Xik qt i 1 . N should produce consistent estimates of all parameters including g. The question is What assumptions are sufficient While we do not cover the asymptotic theory needed for a careful proof until Chapter 12 which treats nonlinear estimation we can provide some intuition here. Because plim d d by the law of large numbers it is reasonable that N X EqtUi N 1 X xyqt E xyqi t 1 i 1 From this relation it is easily shown that the usual OLS assumption in the population that u is uncorrelated with x1 x2 . xK q suffices for the two-step procedure to be consistent along with the rank condition of Assumption applied to the expanded vector of explanatory variables . In other words for consistency replacing qi with qt in an OLS regression causes no problems. Things are not so simple when it comes to inference the standard errors and test statistics obtained from regression are generally invalid because they ignore the sampling variation in d. Since d is also obtained using data usually the same sample of data uncertainty in the estimate .