**AIOU **Solved Assignments code 807 M.A. Spring 2020 Assignment 1& 2 **Course: Department of Economics (807) **Spring 2020. AIOU past papers

**ASSIGNMENT No: 1& 2 **

**Department of Economics (807)**Semester**Spring, 2020**

## AIOU Solved Assignment 1& 2 Code 807 Spring 2020

Q.No.1 Explain the concept of Best Linear Unbiased Estimator (BLUE). Prove that ordinary Least Square (OLS) estimates are BLUE both in mathematical and matrix form.

Ans:- The Gauss-Markov theorem states that if your linear regression model satisfies the first six classical assumptions, then ordinary least squares (OLS) regression produces unbiased estimates that have the smallest variance of all possible linear estimators.

The proof for this theorem goes way beyond the scope of this blog post. However, the critical point is that when you satisfy the classical assumptions, you can be confident that you are obtaining the best possible coefficient estimates. The Gauss-Markov theorem does not state that these are just the best possible estimates for the OLS procedure, but the best possible estimates for any linear model estimator. Think about that!

In my post about the classical assumptions of OLS linear regression, I explain those assumptions and how to verify them. In this post, I take a closer look at the nature of OLS estimates. What does the Gauss-Markov theorem mean exactly when it states that OLS estimates are the best estimates when the assumptions hold true?

The Gauss-Markov Theorem: OLS is BLUE!

The Gauss-Markov theorem famously states that OLS is BLUE. BLUE is an acronym for the following:

Best Linear Unbiased Estimator

In this context, the definition of “best” refers to the minimum variance or the narrowest sampling distribution. More specifically, when your model satisfies the assumptions, OLS coefficient estimates follow the tightest possible sampling distribution of unbiased estimates compared to other linear estimation methods.

Let’s dig deeper into everything that is packed into that sentence!

What Does OLS Estimate?

Regression analysis is like any other inferential methodology. Our goal is to draw a random sample from a population and use it to estimate the properties of that population. In regression analysis, the coefficients in the equation are estimates of the actual population parameters.

The notation for the model of a population is the following:

The betas (β) represent the population parameter for each term in the model. Epsilon (ε) represents the random error that the model doesn’t explain. Unfortunately, we’ll never know these population values because it is generally impossible to measure the entire population. Instead, we’ll obtain estimates of them using our random sample.

The notation for an estimated model from a random sample is the following:

The hats over the betas indicate that these are parameter estimates while e represents the residuals, which are estimates of the random error.

Typically, statisticians consider estimates to be useful when they are unbiased (correct on average) and precise (minimum variance). To apply these concepts to parameter estimates and the Gauss-Markov theorem, we’ll need to understand the sampling distribution of the parameter estimates.

Sampling Distributions of the Parameter Estimates

Imagine that we repeat the same study many times. We collect random samples of the same size, from the same population, and fit the same OLS regression model repeatedly. Each random sample produces different estimates for the parameters in the regression equation. After this process, we can graph the distribution of estimates for each parameter. Statisticians refer to this type of distribution as a sampling distribution, which is a type of probability distribution.

Keep in mind that each curve represents the sampling distribution of the estimates for a single parameter. The graphs below tell us which values of parameter estimates are more and less common. They also indicate how far estimates are likely to fall from the correct value.

Of course, when you conduct a real study, you’ll perform it once, not know the actual population value, and you definitely won’t see the sampling distribution. Instead, your analysis draws one value from the underlying sampling distribution for each parameter. However, using statistical principles, we can understand the properties of the sampling distributions without having to repeat a study many times. Isn’t the field of statistics grand?!

Hypothesis tests also use sampling distributions to calculate p-values and create confidence intervals. For more information about this process, read my post: How Hypothesis Tests Work.

Unbiased Estimates: Sampling Distributions Centered on the True Population Parameter

In the graph below, beta represents the true population value. The curve on the right centers on a value that is too high. This model tends to produce estimates that are too high, which is a positive bias. It is not correct on average. However, the curve on the left centers on the actual value of beta. That model produces parameter estimates that are correct on average. The expected value is the actual value of the population parameter. That’s what we want and satisfying the OLS assumptions helps us!

Keep in mind that the curve on the left doesn’t indicate that an individual study necessarily produces an estimate that is right on target. Instead, it means that OLS produces the correct estimate on average when the assumptions hold true. Different studies will generate values that are sometimes higher and sometimes lower—as opposed to having a tendency to be too high or too low.

Minimum Variance: Sampling Distributions are Tight Around the Population Parameter

In the graph below, both curves center on beta. However, one curve is wider than the other because the variances are different. Broader curves indicate that there is a higher probability that the estimates will be further away from the correct value. That’s not good. We want our estimates to be close to beta.

Both studies are correct on average. However, we want our estimates to follow the narrower curve because they’re likely to be closer to the correct value than the wider curve. The Gauss-Markov theorem states that satisfying the OLS assumptions keeps the sampling distribution as tight as possible for unbiased estimates.

The Best in BLUE refers to the sampling distribution with the minimum variance. That’s the tightest possible distribution of all unbiased linear estimation methods!

Gauss-Markov Theorem OLS Estimates and Sampling Distributions

As you can see, the best estimates are those that are unbiased and have the minimum variance. When your model satisfies the assumptions, the Gauss-Markov theorem states that the OLS procedure produces unbiased estimates that have the minimum variance. The sampling distributions are centered on the actual population value and are the tightest possible distributions. Finally, these aren’t just the best estimates that OLS can produce, but the best estimates that any linear model estimator can produce. Powerful stuff!

### AIOU Solved Assignment 1& 2 Code 807 Spring 2020

Q.No.2 What are properties of error term in a simple regression model? What assumption is made about probability distribution of error term?

Ans:- An error term is a residual variable produced by a statistical or mathematical model, which is created when the model does not fully represent the actual relationship between the independent variables and the dependent variables. As a result of this incomplete relationship, the error term is the amount at which the equation may differ during empirical analysis.

The error term is also known as the residual, disturbance, or remainder term, and is variously represented in models by the letters e, ε, or u.

KEY TAKEAWAYS

An error term appears in a statistical model, like a regression model, to indicate the uncertainty in the model.

The error term is a residual variable that accounts for a lack of perfect goodness of fit.

Heteroskedastic refers to a condition in which the variance of the residual term, or error term, in a regression model varies widely.

Understanding an Error Term

An error term represents the margin of error within a statistical model; it refers to the sum of the deviations within the regression line, which provides an explanation for the difference between the theoretical value of the model and the actual observed results. The regression line is used as a point of analysis when attempting to determine the correlation between one independent variable and one dependent variable.

Error Term Use in a Formula

An error term essentially means that the model is not completely accurate and results in differing results during real-world applications. For example, assume there is a multiple linear regression function that takes the following form:

\begin{aligned} &Y = \alpha X + \beta \rho + \epsilon \\ &\textbf{where:} \\ &\alpha, \beta = \text{Constant parameters} \\ &X, \rho = \text{Independent variables} \\ &\epsilon = \text{Error term} \\ \end{aligned}Y=αX+βρ+ϵwhere:α,β=Constant parametersX,ρ=Independent variablesϵ=Error term

When the actual Y differs from the expected or predicted Y in the model during an empirical test, then the error term does not equal 0, which means there are other factors that influence Y.

What Do Error Terms Tell Us?

Within a linear regression model tracking a stock’s price over time, the error term is the difference between the expected price at a particular time and the price that was actually observed. In instances where the price is exactly what was anticipated at a particular time, the price will fall on the trend line and the error term will be zero.

Points that do not fall directly on the trend line exhibit the fact that the dependent variable, in this case, the price, is influenced by more than just the independent variable, representing the passage of time. The error term stands for any influence being exerted on the price variable, such as changes in market sentiment.

The two data points with the greatest distance from the trend line should be an equal distance from the trend line, representing the largest margin of error.

If a model is heteroskedastic, a common problem in interpreting statistical models correctly, it refers to a condition in which the variance of the error term in a regression model varies widely.

Linear Regression, Error Term, and Stock Analysis

Linear regression is a form of analysis that relates to current trends experienced by a particular security or index by providing a relationship between a dependent and independent variables, such as the price of a security and the passage of time, resulting in a trend line that can be used as a predictive model.

A linear regression exhibits less delay than that experienced with a moving average, as the line is fit to the data points instead of based on the averages within the data. This allows the line to change more quickly and dramatically than a line based on numerical averaging of the available data points.

The Difference Between Error Terms and Residuals

Although the error term and residual are often used synonymously, there is an important formal difference. An error term is generally unobservable and a residual is observable and calculable, making it much easier to quantify and visualize. In effect, while an error term represents the way observed data differs from the actual population, a residual represents the way observed data differs from sample population data.

### AIOU Solved Assignment 1& 2 Code 807 Spring 2020

Q.No.3 Let Ŷ = X(XX)-1Y. Find the OLS coefficient from a regression of Ŷ on X.

Ans:- In agricultural research we are often interested in describing the change in one variable (Y, the dependent variable) in terms of a unit change in a second variable (X, the independent variable). Regression is commonly used to establish such a relationship. A simple linear regression takes the form of Y$ = a + bx where is the predicted value of Y for a given value of X, a estimates the intercept of the regression line with the Y axis, and b estimates the slope or rate of change in Y for a unit change in X. Y$ The regression coefficients, a and b, are calculated from a set of paired values of X and Y. The problem of determining the best values of a and b involves the principle of least squares. 10.1 The Regression Equation To illustrate the principle, we will use the artificial data presented as a scatter diagram in Figure 10-1. Figure 10-1. A scatter diagram to illustrate the linear relationship between 2 variables. Because of the existence of experimental errors, the observations (Y) made for a given set of independent values (X) will not permit the calculation of a single straight line that will go through all the points. The least squares line is the line that goes through the points so that the sum of the squares of the vertical deviations of the points from the line is minimal. Those with a knowledge of calculus should recognize that this is a problem of finding the minimum value of a function. That is, set the first derivatives of the regression equation with respect to a and b to zero and solve for a and b. This procedure yields the following formulas for a and b based on k pairs of X and Y: If X is not a random variable, the coefficients so obtained are the best linear unbiased estimates of the true parameters. b Independent Variable – Fixed Design Points In Chapter 9, we showed that a linear response was appropriate to describe the effect of N fertilizer on the sucrose content of beet roots. Note that the N rates were specifically chosen by the experimenter and, therefore, are considered fixed design points. The differences in the levels are not random. Now we show the computation of the regression equation for this situation. The first step is to complete a scatter diagram of the mean responses of % sucrose to increasing levels of N. The data are given in Table 10-1 and the scatter diagram in Figure 10-2. The construction of the least squares line is as follows: Table 10-1. Elements necessary to compute the least squares regression for changes in % sucrose associated with changes in N-fertilizer. X lbs N (acre) Y mean % (sucrose) X2 XY Y$ predicted (% sucrose) Y$ -Y 0 16.16 0 0 16.22 -0.06 50 15.74 2,500 787 15.78 -0.04 100 15.29 10,000 1,529 15.35 -0.06 150 15.29 22,500 2,293.5 14.92 0.39 200 14.36 40,000 2,872 14.48 -0.12 250 13.94 62,500 3,485 14.05 -0.11 0087 1513 0 0087 125 16 22 The resulting regression equation is, = 16.22 – – 0.0087X. This equation says that for every additional pound of fertilizer N, % sucrose decreases by 0.0087 sucrose percentage points. Our best estimate of percent sucrose from 0 to 250 lb N/acre is determined by substituting the N rate in the regression equation and calculating Y (the last column of Table 10-1). For example, we may want to estimate % sucrose for 135 lb N/acre, then Y$ = 16.22 – 0.0087(125) = 15.13 Y$ Independence variable – measurement with error Sometimes researchers are interested in estimating a quantity that is difficult to measure directly. It is desirable to be able to predict this quantity from another variable that is easier to measure. For example, to predict leaf area from the length and width of leaves, sugar content from percent total solids, or rate of gain from initial body weight. For a case study we will use data collected to see if it is possible to predict the weight of the livers of mice from their body weights. The data are given in Table 10-2 and the calculation of the regression line is shown below the table. Table 10-2. Mice body and liver weights (grams) and predicted liver weights from a linear regression of Y)( . ) / . ( .) / . .(.) . $ . . 2 2 2 34910 1059 19 66 6 1872 05 1059 6 328 0 72 17 65 9 43 9 43 0 72 The predicted values of Y are obtained by substituting X’s in the regression equation. The values of in Table 10-2 were calculated to several decimal places and rounded off, and therefore will not be exactly equal to values by using the regression equation given above. Y$ Y$ The relation between body and liver weights and the regression line are plotted in Figure 10-3 Figure 10-3. Linear regression of liver weight (g.) on body weight (10 g) of mice. Note that the calculation procedures for determining the regressions of Figures 10-2 and 10-3 are identical. However, in the case where X values are measured with error there are two variances, one associated with measuring Y and the other with measuring X. The variance in measuring Y is and the variance in measuring X values estimated regression coefficient (b) is biased toward The effect of the error of X on the standard error of b is not always biased one direction, but the ratio of the regression coefficient and the standard error (that t statistic for testing a greater than zero slope) is always smaller in absolute value than the case when X values can be fixed experimentally without error. Therefore, the probability of detecting a nonzero slope is decreased. Thus an experimenter may be justified in selecting a higher probability for rejection of the null hypothesis (e.g. 10% rather than 5%). We now turn to the consideration of the validity and usefulness of regression equations. 10-2. The analysis of variance of regression The total sum of squares of the dependent variable (Y) can be partitioned into two components: one due to the regression line and the other due to the sum of squares not explainable by the regression line. The deviation of each Y from Y is made up of a deviation due to regression, – Y$ Y and a with mean square MSR. Σ( is the sum of squares not explainable by the regression line, and is $ Y Y− ) 2 called the residual sum of squares Ssr, with mean square Msr. This information can be summarized in an analysis of variance table (Table 10-3). Table 10-3. Analysis of variance of regression. Source DF SS MS F k-1 Σ( ) Y Y− Total 2 Regression 1 Σ( $ Y Y− ) 2 MSR MSR/MSr Residual k-2 Σ( $ Y Y− ) 2 MSr The F test, MSR/MSr provides a test for the null hypothesis which is that the true regression coefficient equals Zero, β = 0, versus the alternative hypothesis that β ≠ 0. This test is only valid when Msr estimates the variance of the experimental error. However, this condition cannot be tested unless there are replications of Y-values for each X so that the true experimental error can be estimated. The machine formulas for the sum of squares in Table Ssr = SSY – SSR = 2.02 – 1.53 = 0.49 The significant F-test suggests that there is a nonzero regression coefficient. However, due to the lack of replication, no rigorous assessment of lack of fit to the model can be made. 10.3 Testing Fitness of a Regression Model In this section, data of nitrogen content in corn crops obtained from a CRD field experiment will be used to illustrate the procedure of testing fitness of a regression model. Five levels of fertilization with a stable isotopic formulation of ammonium sulfate were involved in the experiment. This formulation enabled the researcher to distinguish between nitrogen in the crop derived from the fertilizer and soil. The data are shown in Table 10-5. Table 10-5. Nitrogen (lb/acre) in a corn crop (green, cobs and fodder) derived from 5 rates of N15 depleted ammonium sulfate. Fertilizer N Replication lb/acre 1 2 3 Mean 50 20.47 20.91 18.15 19.84 100 41.61 44.07 60.03 48.57 150 89.06 86.27 87.16 87.50 200 83.83 116.16 120.67 106.89 250 121.43 250 153.68 133.45 The data in Table 10-5 are plotted in Figure 10-4, a practice that provides a visual examination of the response trend. Figure 10-4. A plot of the data in Table 10-5. The intercept and regression coefficients are calculated as shown in Section 10.1. this can be done by using all observations or by just using the treatment means. The coefficients will. The regression coefficient estimates the rate of fertilizer-N recovery by the crop, that is, 57% of the applied fertilizer-N is taken up by the corn crop. Note that the true intercept should not be less than zero which indicates that the information of this regression should not be extrapolated below 50 lb/acre fertilizer N. To test how well the regression model fits the data, we proceed with the analysis outlined in Table 10-6. Table 10-6. AOV to test fitness of a regression model with k levels of treatment and n replications per treatment. Source df SS MS F Total kn-1 SSY Regression 1 SSR MSR Residual kn-2 Ssr Msr Deviation k-2 SSD MSD MSD/MSE Exp. error difference between Ssr and SSE measures the deviations of the data points from the regression line that are not due to experimental error. This is frequently called the “lack of fit” sum of squares, and is denoted as sum of squares of deviation, SSD. SSD = Ssr – SSE The ratio of MSD/MSE provides a F test of the lack of fit of the regression model. The nonsignificance of this F value indicates that the deviation from the linear regression is entirely due to random error. Thus a linear regression is a good description of the relationship between the dependent and independent variables. A significant F test would suggest the existence of a non-random deviation from the linear model and that the data may be better described by some other model. For the data in Table 10-5, the AOV is given in Table 10-7. Table 10-7. AOV of the nitrogen recovery data. Source df SS MS F Total 14 26,302.81 Regression 1 24,459.36 24,459.36 172.49 Residual 13 1,843.45 141.80 Deviation 3 262.09 87.36 0.55 Exp. error 10 1,581.35 158.13 The nonsignificant lack of fit F test (F = 0.55) indicates that a linear regression is an adequate model to describe the uptake of fertilizer-N by corn. The hypothesis of a zero regression slope is then tested by using the residual mean square to form the test F = MSR/MSr = 172.49. The F is highly significant (P < 0.01) indicating the null hypothesis should be rejected. If the lack of fit F test is significant, then MSE should be used instead of Msr to form a F test (F = MSR/MSE) about the hypothesis of a zero regression slope.

### AIOU Solved Assignment 1& 2 Code 807 Spring 2020

Q.No.4 Explain Hypothesis. What is meaning of “Accepting” or “Rejecting” Hypothesis?

Ans:- Hypothesis testing was introduced by Ronald Fisher, Jerzy Neyman, Karl Pearson and Pearson’s son, Egon Pearson. Hypothesis testing is a statistical method that is used in making statistical decisions using experimental data. Hypothesis Testing is basically an assumption that we make about the population parameter.

Key Terms and Concepts

Null hypothesis: Null hypothesis is a statistical hypothesis that assumes that the observation is due to a chance factor. Null hypothesis is denoted by; H0: μ1 = μ2, which shows that there is no difference between the two population means.

Alternative hypothesis: Contrary to the null hypothesis, the alternative hypothesis shows that observations are the result of a real effect.

Level of significance: Refers to the degree of significance in which we accept or reject the null-hypothesis. 100% accuracy is not possible for accepting or rejecting a hypothesis, so we therefore select a level of significance that is usually 5%.

Type I error: When we reject the null hypothesis, although that hypothesis was true. Type I error is denoted by alpha. In hypothesis testing, the normal curve that shows the critical region is called the alpha region.

Type II errors: When we accept the null hypothesis but it is false. Type II errors are denoted by beta. In Hypothesis testing, the normal curve that shows the acceptance region is called the beta region.

Power: Usually known as the probability of correctly accepting the null hypothesis. 1-beta is called power of the analysis.

One-tailed test: When the given statistical hypothesis is one value like H0: μ1 = μ2, it is called the one-tailed test.

Two-tailed test: When the given statistics hypothesis assumes a less than or greater than value, it is called the two-tailed test.

Statistical Decision for Hypothesis Testing

In statistical analysis, we have to make decisions about the hypothesis. These decisions include deciding if we should accept the null hypothesis or if we should reject the null hypothesis. Every test in hypothesis testing produces the significance value for that particular test. In Hypothesis testing, if the significance value of the test is greater than the predetermined significance level, then we accept the null hypothesis. If the significance value is less than the predetermined value, then we should reject the null hypothesis. For example, if we want to see the degree of relationship between two stock prices and the significance value of the correlation coefficient is greater than the predetermined significance level, then we can accept the null hypothesis and conclude that there was no relationship between the two stock prices. However, due to the chance factor, it shows a relationship between the variables

Q.No.5Writhe notes on the following:

(a) Two – stage least squares

(b) Three – stage least squares

Ans:- wo-Stage least squares (2SLS) regression analysis is a statistical technique that is used in the analysis of structural equations. This technique is the extension of the OLS method. It is used when the dependent variable’s error terms are correlated with the independent variables. Additionally, it is useful when there are feedback loops in the model. In structural equations modeling, we use the maximum likelihood method to estimate the path coefficient. This technique is an alternative in SEM modeling to estimate the path coefficient. This technique can also be applied in quasi-experimental studies.

Questions Answered:

How much can be budgeted in order to accurately estimate how much wheat is needed to produce bread?

What is the price of wheat? Is it on an upward trend?

Determine the final price for its bread.

Assumptions:

Models (equations) should be correctly identified.

The error variance of all the variables should be equal.

Error terms should be normally distributed.

It is assumed that the outlier(s) is removed from the data.

Observations should be independents of each other.

Key concepts and terms:

Problematic causal variable: The dependent or endogenous variable whose error term is correlated with the other dependent variable error term. A problematic causal variable is replaced with the substitute variable in the first stage of the analysis.

Instruments: An instrument variable is used to create a new variable by replacing the problematic variable.

Stages: In ordinary least square method, there is a basic assumption that the value of the error terms is independent of predictor variables. When this assumption is broken, this technique helps us to solve this problem. This analysis assumes that there is a secondary predictor that is correlated to the problematic predictor but not with the error term. Given the existence of the instrument variable, the following two methods are used:

In the first stage, a new variable is created using the instrument variable.

In the second stage, the model-estimated values from stage one are then used in place of the actual values of the problematic predictors to compute an OLS model for the response of interest.

SPSS:

All statistical software does not perform this regression method. In SPSS, to perform this analysis, the following steps are involved:

Click on the “SPSS” icon from the start menu.

Click on the “Open data” icon and select the data.

Click on the “analysis” menu and select the “regression” option.

Select two-stage least squares (2SLS) regression analysis from the regression option. From the 2SLS regression window, select the dependent, independent and instrumental variable. Click on the “ok” button. The result window will appear in front of us. The result explanation of the analysis is same as the OLS, MLE or WLS method.

Data Analysis Plan

Edit your research questions and null/alternative hypotheses

Write your data analysis plan; specify specific statistics to address the research questions, the assumptions of the statistics, and justify why they are the appropriate statistics; provide references

Justify your sample size/power analysis, provide references

Explain your data analysis plan to you so you are comfortable and confident

Two hours of additional support with your statistician

Quantitative Results Section (Descriptive Statistics, Bivariate and Multivariate Analyses, Structural Equation Modeling, Path analysis, HLM, Cluster Analysis)

Clean and code dataset

Conduct descriptive statistics (i.e., mean, standard deviation, frequency and percent, as appropriate)

Conduct analyses to examine each of your research questions

Write-up results

Provide APA 6th edition tables and figures

Explain chapter 4 findings

Ongoing support for entire results chapter statistics

Angrist, J. D., & Imbens, G. W. (1995). Two-stage least squares estimation of average causal effects in models with variable treatment intensity. Journal of the American Statistical Association, 90(430), 431-442.

Benda, B. B., & Corwyn, R. F. (1997). A test of a model with reciprocal effects between religiosity and various forms of delinquency using 2-stage least squares regression. Journal of Social Service Research, 22(3), 27-52.

Bollen, K. A. (1996). An alternative two stage least squares (2SLS) estimator for latent variable equations. Psychometrika, 61(1), 109-121.

Freedman, D. (1984). On bootstrapping two-stage least-squares estimates in stationary linear models. The Annals of Statistics, 12(3), 827-842.

Hsiao, C. (1997). Statistical properties of the two-stage least squares estimator under cointegration. Review of Economic Studies, 64, 385-398.

James, L. R., & Singh, B. K. (1978). An introduction to the logic, assumptions, and basic analytic procedures of two-stage least squares. Psychological Bulletin, 85(5), 1104-1122.

Kelejian, H. H., & Prucha, I. R. (1997). Estimation of spatial regression models with autoregressive errors by two-stage least squares procedures: A serious problem. International Regional Science Review, 20(1), 103-111.

Kelejian, H. H., & Prucha, I. R. (1998). A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances. The Journal of Real Estate Finance and Economics, 17(1), 99-121.

Land, K. C., & Deane, G. (1992). On the large-sample estimation of regression models with spatial- or network-effects terms: A two-stage least squares approach. Sociological Methodology, 22, 221-248.

Ramsey, J. B. (1969). Tests for specification errors in classical linear least-squares regression analysis. Journal of the Royal Statistical Society, 31(2), 350-371.

Scott, A. J., & Holt, D. (1982). The effect of two-stage sampling on ordinary least squares methods. Journal of the American Statistical Association, 77(380), 848-854.

Related Pages:

Structural Equation Modeling

The term three-stage least squares (3SLS) refers to a method of estimation that combines system equation, sometimes known as seemingly unrelated regression (SUR), with two-stage least squares estimation. It is a form of instrumental variables estimation that permits correlations of the unobserved disturbances across several equations, as well as restrictions among coefficients of different equations, and improves upon the efficiency of equation-by-equation estimation by taking into account such correlations across equations. Unlike the two-stage least squares (2SLS) approach for a system of equations, which would estimate the coefficients of each structural equation separately, the three-stage least squares estimates all coefficients simultaneously. It is assumed that each equation of the system is at least just-identified. Equations that are underidentified are disregarded in the 3SLS estimation.

Three-stage least squares originated in a paper by Arnold Zellner and Henri Theil (1962). In the classical specification, although the structural disturbances may be correlated across equations (contemporaneous correlation ), it is assumed that within each structural equation the disturbances are both homoskedastic and serially uncorrelated. The classical specification thus implies that the disturbance covariance matrix within each equation is diagonal, whereas the entire system’s covariance matrix is nondiagonal.

The Zellner-Theil proposal for efficient estimation of this system is in three stages, wherein the first stage involves obtaining estimates of the residuals of the structural equations by two-stage least squares of all identified equations; the second stage involves computation of the optimal instrument, or weighting matrix, using the estimated residuals to construct the disturbance variance-covariance matrix; and the third stage is joint estimation of the system of equations using the optimal instrument. Although 3SLS is generally asymptotically more efficient than 2SLS, if even a single equation of the system is mis-specified, 3SLS estimates of coefficients of all equations are generally inconsistent.

The Zellner-Theil 3SLS estimator for the coefficient of each equation is shown to be asymptotically at least as efficient as the corresponding 2SLS estimator of that equation. However, Zellner and Theil also discuss a number of interesting conditions under which 3SLS and 2SLS estimators are equivalent. First, if the structural disturbances have no mutual correlations across equations (the variance-covariance matrix of the system disturbances is diagonal), then 3SLS estimates are identical to the 2SLS estimates equation by equation. Second, if all equations in the system are just-identified, then 3SLS is also equivalent to 2SLS equation by equation. Third, if a subset of m equations is overidentified while the remaining equations are just-identified, then 3SLS estimation of the m over-identified equations is equivalent to 2SLS of these m equations.

The 3SLS estimator has been extended to estimation of a nonlinear system of simultaneous equations by Takeshi Amemiya (1977) and Dale Jorgenson and Jean-Jacques Laffont (1975). An excellent discussion of 3SLS estimation, including a formal derivation of its analytical and asymptotic properties, and its comparison with full-information maximum likelihood (FIML), is given in Jerry Hausman (1983).

SEE ALSO Instrumental Variables Regression; Least Squares, Two-Stage; Regression; Seemingly Unrelated Regressions

Amemiya, Takeshi. 1977. The Maximum Likelihood and the Nonlinear Three-stage Least Squares Estimator in the General Nonlinear Simultaneous Equation Model. Econometrica 45 (4): 955–968.

Encyclopedia 1080

Volume 0%

00:00

01:04

Dhrymes, Phoebus J. 1973. Small Sample and Asymptotic Relations Between Maximum Likelihood and Three Stage Least Squares Estimators. Econometrica 41 (2): 357–364.

Gallant, A. Ronald, and Dale W. Jorgenson. 1979. Statistical Inference for a System of Simultaneous, Non-linear, Implicit Equations in the Context of Instrumental Variable Estimation. Journal of Econometrics 11: 275–302.

Robinson, Peter M. 1991. Best Nonlinear Three-stage Least Squares Estimation of Certain Econometric Models. Econometrica 59 (3): 755–786

where is assignment no 2?