4.2 Summary

Course subject(s) 4. Best Linear Unbiased Estimation (BLUE)

Best linear unbiased estimation

BLUE is an acronym for Best Linear Unbiased Estimation. Although it is based on a fundamentally different background then (weighted) least squares (WLS), it can be regarded as a further refinement of WLS.

Comparing the weighted least squares estimator:

\[ \hat{\underline{x}}_{\text{WLS}}= (A^T W A)^{-1} A^T W \underline{y}\]

with the estimator of the vector of unknowns following from BLUE:

\[\hat{\underline{x}}_{\text{BLU}}= (A^T Q_{yy}^{-1} A)^{-1} A^T Q_{yy}^{-1} \underline{y}\]

We see that the weight matrix \(W\) is now replaced by the inverse of the covariance matrix \(Q_{yy}\).

This makes sense intuitively: suppose we only have one observation, it means that the covariance matrix would be a scalar with the variance of that observation in it. Taking the inverse of that variance implies that a smaller variance (i.e., an observation with a better precision), will yield a greater weight. Hence, similar for a vector of \(m\) observations, where the covariance matrix will be of size \(m\times m\), taking the inverse of this matrix will yield the proper weight matrix \(W\).

Taking this particular weight matrix, i.e., \(W=Q_{yy}^{-1}\), has a special meaning. It has “Best” properties. This means that using this particular weight matrix, we will get an linear estimator which has minimal variance. In other words, with this particular weight matrix we get the best possible estimator among all linear estimators, where ‘best’ represents optimal precision or minimal variance.

Given the BLU-estimator for \(\hat{\underline{x}}\), we can also find the BLU-estimators for \(\hat{\underline{y}} =A\hat{\underline{x}}\),and for \(\hat{\underline{e}} = \underline{y}-\hat{\underline{y}} \),

\[\hat{\underline{y}}= A(A^T Q_{yy}^{-1} A)^{-1} A^T Q_{yy}^{-1} \underline{y}\]

\[\hat{\underline{e}}= \underline{y}-A(A^T Q_{yy}^{-1} A)^{-1} A^T Q_{yy}^{-1} \underline{y}\]

BLUE decomposed

In BLUE, Best Linear Unbiased Estimation, the parts of the acronym ‘B’, ‘L’, and ‘U’ have a specific meaning.

· ‘Linear’ means that there is a linear (matrix) relation between the variables. As discussed in Module 4.1, such linear relations imply that if we have, e.g., a normally distributed vector \(\underline{u}\), which is multiplied with a matrix \(L\), this product will be normally distributed as well. In other words, with \[\underline{v}=L\underline{u},\] it means that \(\underline{v}\) will be normally distributed as well

· ‘Unbiased’ means that the expected value of an estimator of a parameter is equal to the value of that parameter. In other words: \[E\{\hat{\underline{x}}\}= x.\]

· ‘Best’ means that the estimator has minimum variance (best precision), when compared to all other possible linear estimators. (Note that this is equivalent to "minimum mean squared errors" \(E\{\|\hat{x}-x\|^2\}\). )

Quality expressions

The quality of the estimator is expressed by its covariance matrix. For the ‘best linear unbiased’ estimator of \(\hat{\underline{x}}\) we find

\[Q_{\hat{x}\hat{x}} = (A^T Q_{yy}^{-1} A)^{-1}\]

In summary, by applying the BLUE method, we can compute best estimators among all linear unbiased estimators, where ‘best’ is quantitatively expressed via the covariance matrix.

Additional note on the linearity condition of BLUE

The BLUE estimator is the best (or minimum variance) among all linear unbiased estimators. So if we drop the condition of linearity, then BLUE is not necessarily the best. It means that there may be some other non-linear estimators that have even better precision than the BLUE. However, it can be proven that, in the case of normally distributed observations, the BLUE is also the best among all possible estimators. So we can say: for observations \(\underline{y}\) that are normally distributed, the BLUE is BUE (best unbiased estimator).

Additional note on the Maximum likelihood estimator

(Weighted) Least Squares and Best Linear Unbiased estimation are two different estimation principles, with the very nice property that they become identical in case \(W=Q_{yy}^{-1}\). There is one more estimator we would like to mention here: the maximum likelihood (ML) estimator. It is based on maximizing the so-called likelihood function of a given observation vector \(y\). For this purpose, the general structure of the probability density function (PDF) of \(\underline{y}\) must be known, except for the \(n\) unknown parameters \(x\). In our case, when \(\underline{y}\) is assumed to be normally distributed, the maximum likelihood estimator becomes identical to BLUE.

Observation Theory: Estimating the Unknown by TU Delft OpenCourseWare is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Based on a work at https://ocw.tudelft.nl/courses/observation-theory-estimating-unknown.