估计量的偏差

在统计学中，估计量的偏差（或偏差函数）是此估计量的期望值与估计参数的真值之差。偏差为零的估计量或决策规则称为无偏的。否则该估计量是有偏的。在统计中，“偏差”是一个函数的客观陈述。

偏差也可以相对于中位数来衡量，而非相对于均值（期望值），在这种情况下为了与通常的“均值”无偏性区别，称作“中值”无偏。偏差与一致性相关联，一致估计量都是收敛并且渐进无偏的（因此会收敛到正确的值），虽然一致序列中的个别估计量可能是有偏的（只要偏差收敛于零）；参见偏差与一致性。

当其他量相等时，无偏估计量比有偏估计量更好一些，但在实践中，并不是所有其他统计量的都相等，于是也经常使用有偏估计量，一般偏差较小。当使用一个有偏估计量时，也会估计它的偏差。有偏估计量可能用于以下原因：由于如果不对总体进一步假设，无偏估计量不存在或很难计算（如标准差的无偏估计（英语：unbiased estimation of standard deviation））；由于估计量是中值无偏的，却不是均值无偏的（或反之）；由于一个有偏估计量较之无偏估计量（特别是收缩估计量（英语：shrinkage estimator））可以减小一些损失函数（尤其是均方差）；或者由于在某些情况下，无偏的条件太强，这种情况无偏估计量不是必要的。此外，在非线性变换下均值无偏性不会保留，不过中值无偏性会保留（参见变换的效应）；例如样本方差是总体方差的无偏估计量，但它的平方根标准差则是总体标准差的有偏估计量。下面会进行说明。

定义

设我们有一个参数为实数 θ 的概率模型，产生观测数据的概率分布 $P_{\theta }(x)=P(x\mid \theta )$ ，而统计量 ${\hat {\theta }}$ 是基于任何观测数据 $x$ 下 θ 的估计量。也就是说，我们假定我们的数据符合某种未知分布 $P_{\theta }(x)=P(x\mid \theta )$ （其中 θ 是一个固定常数，而且是该分布的一部分，但具体值未知），于是我们构造估计量 ${\hat {\theta }}$ ，该估计量将观测数据与我们希望的接近 θ 的值对应起来。因此这个估量的（相对于参数 θ的）偏差定义为

\operatorname {Bias} _{\theta }[\,{\hat {\theta }}\,]=\operatorname {E} _{\theta }[\,{\hat {\theta }}\,]-\theta =\operatorname {E} _{\theta }[\,{\hat {\theta }}-\theta \,],

其中 $\operatorname {E} _{\theta }$ 表示分布 $P_{\theta }(x)=P(x\mid \theta )$ 的期望值，即对所有可能的观测值 $x$ 取平均。由于 θ 对于条件分布 $P(x\mid \theta )$ 是可测的，就有了第二个等号。

对于参数 θ 的所有值的偏差都等于零的估计量称为无偏估计量。

在一次关于估计量性质的模拟实验中，估计量的偏差可以用平均有符号离差（英语：mean signed difference）来评估。

例子

样本方差

随机变量的样本方差从两方面说明了估计量偏差：首先，自然估计量（naive estimator）是有偏的，可以通过比例因子校正；其次，无偏估计量的均方差（MSE）不是最优的，可以用一个不同的比例因子来最小化，得到一个比无偏估计量的MSE更小的有偏估计量。

具体地说，自然估计量就是将离差平方和加起来然后除以 n，是有偏的。不过除以 n − 1 会得到一个无偏估计量。相反，MSE可以通过除以另一个数来最小化（取决于分布），但这会得到一个有偏估计量。这个数总会比 n − 1 大，所以这就叫做收缩估计量（英语：shrinkage estimator），因为它把无偏估计量向零“收缩”；对于正态分布，最佳值为 n + 1。

设 X₁, ..., X_n 是期望为 μ、方差为 σ² 的独立同分布（i.i.d.）随机变量。如果样本均值与未修正样本方差定义为

{\overline {X}}={\frac {1}{n}}\sum _{i=1}^{n}X_{i},\qquad S^{2}={\frac {1}{n}}\sum _{i=1}^{n}\left(X_{i}-{\overline {X}}\,\right)^{2},

则 S² 是 σ² 的一个有偏估计量，因为

{\begin{aligned}\operatorname {E} [S^{2}]&=\operatorname {E} \left[{\frac {1}{n}}\sum _{i=1}^{n}{\big (}X_{i}-{\overline {X}}{\big )}^{2}\right]=\operatorname {E} {\bigg [}{\frac {1}{n}}\sum _{i=1}^{n}{\bigg (}(X_{i}-\mu )-({\overline {X}}-\mu ){\bigg )}^{2}{\bigg ]}\\[8pt]&=\operatorname {E} {\bigg [}{\frac {1}{n}}\sum _{i=1}^{n}{\bigg (}(X_{i}-\mu )^{2}-2({\overline {X}}-\mu )(X_{i}-\mu )+({\overline {X}}-\mu )^{2}{\bigg )}{\bigg ]}\\[8pt]&=\operatorname {E} {\bigg [}{\frac {1}{n}}\sum _{i=1}^{n}(X_{i}-\mu )^{2}-{\frac {2}{n}}({\overline {X}}-\mu )\sum _{i=1}^{n}(X_{i}-\mu )+{\frac {1}{n}}({\overline {X}}-\mu )^{2}\sum _{i=1}^{n}1{\bigg ]}\\[8pt]&=\operatorname {E} {\bigg [}{\frac {1}{n}}\sum _{i=1}^{n}(X_{i}-\mu )^{2}-{\frac {2}{n}}({\overline {X}}-\mu )\sum _{i=1}^{n}(X_{i}-\mu )+{\frac {1}{n}}({\overline {X}}-\mu )^{2}\cdot n{\bigg ]}\\[8pt]&=\operatorname {E} {\bigg [}{\frac {1}{n}}\sum _{i=1}^{n}(X_{i}-\mu )^{2}-{\frac {2}{n}}({\overline {X}}-\mu )\sum _{i=1}^{n}(X_{i}-\mu )+({\overline {X}}-\mu )^{2}{\bigg ]}\\[8pt]\end{aligned}}

换句话说，未修正的样本方差的期望值不等于总体方差 σ²，除非乘以归一化因子。而样本均值是总体均值 μ 的无偏^[1]估计量。

S² 是有偏的原因源于样本均值是 μ 的普通最小二乘（英语：ordinary least squares）（OLS）估计量这个事实： ${\overline {X}}$ 是令 $\sum _{i=1}^{n}(X_{i}-{\overline {X}})^{2}$ 尽可能小的数。也就是说，当任何其他数代入这个求和中时，这个和只会增加。尤其是，在选取 $\mu \neq {\overline {X}}$ 就会得出，

{\frac {1}{n}}\sum _{i=1}^{n}(X_{i}-{\overline {X}})^{2}<{\frac {1}{n}}\sum _{i=1}^{n}(X_{i}-\mu )^{2},

于是

{\begin{aligned}\operatorname {E} [S^{2}]&=\operatorname {E} {\bigg [}{\frac {1}{n}}\sum _{i=1}^{n}(X_{i}-{\overline {X}})^{2}{\bigg ]}<\operatorname {E} {\bigg [}{\frac {1}{n}}\sum _{i=1}^{n}(X_{i}-\mu )^{2}{\bigg ]}=\sigma ^{2}.\end{aligned}}

注意到，通常的样本方差定义为

s^{2}={\frac {1}{n-1}}\sum _{i=1}^{n}(X_{i}-{\overline {X}}\,)^{2},

而这时总体方差的无偏估计量。可以由下式看出：

\operatorname {E} {\big [}({\overline {X}}-\mu )^{2}{\big ]}={\frac {1}{n}}\sigma ^{2}.

方差的有偏（未修正）与无偏估计之比称为贝塞尔修正（英语：Bessel's correction）。

参见

参考文献

Brown, George W. "On Small-Sample Estimation." The Annals of Mathematical Statistics, vol. 18, no. 4 (Dec., 1947), pp. 582–585.
JSTOR 2236236
.
Lehmann, E. L.（英语：Erich Leo Lehmann） "A General Concept of Unbiasedness" The Annals of Mathematical Statistics, vol. 22, no. 4 (Dec., 1951), pp. 587–592.
JSTOR 2236928
.
Allan Birnbaum（英语：Allan Birnbaum）, 1961. "A Unified Theory of Estimation, I", The Annals of Mathematical Statistics, vol. 32, no. 1 (Mar., 1961), pp. 112–135.
Van der Vaart, H. R., 1961. "Some Extensions of the Idea of Bias" The Annals of Mathematical Statistics, vol. 32, no. 2 (June 1961), pp. 436–447.
Pfanzagl, Johann. 1994. Parametric Statistical Theory. Walter de Gruyter.
Stuart, Alan; Ord, Keith; Arnold, Steven [F.]. Classical Inference and the Linear Model. Kendall's Advanced Theory of Statistics 2A. Wiley. 2010. ISBN 0-4706-8924-2. .
Voinov, Vassily [G.]; Nikulin, Mikhail [S.]. Unbiased estimators and their applications. 1: Univariate case. Dordrect: Kluwer Academic Publishers. 1993. ISBN 0-7923-2382-3.
Voinov, Vassily [G.]; Nikulin, Mikhail [S.]. Unbiased estimators and their applications. 2: Multivariate case. Dordrect: Kluwer Academic Publishers. 1996. ISBN 0-7923-3939-8.
Klebanov, Lev [B.]; Rachev, Svetlozar [T.]; Fabozzi, Frank [J.]. Robust and Non-Robust Models in Statistics. New York: Nova Scientific Publishers. 2009. ISBN 978-1-60741-768-2.

外部链接

Hazewinkel, Michiel (编), Unbiased estimator, 数学百科全书, Springer, 2001, ISBN 978-1-55608-010-4

^ Richard Arnold Johnson; Dean W. Wichern. Applied Multivariate Statistical Analysis. Pearson Prentice Hall. 2007 [10 August 2012]. ISBN 978-0-13-187715-3. （原始内容存档于2016-05-29）.

[JohnsonWichern2007-1] Richard Arnold Johnson; Dean W. Wichern. Applied Multivariate Statistical Analysis. Pearson Prentice Hall. 2007 [10 August 2012]. ISBN 978-0-13-187715-3. （原始内容存档于2016-05-29）.

[1]