Home          Downloads          SPD on the web          Links

[← Non-linear TRM]     [↑ SPD on the web]      [Calibration data set →]

10 The statistics of multiple paleointensity estimates

Averaging and weighting

Statistic: $$N$$

The number of paleointensity estimates to be analyzed.

 

Statistic: $$B_j$$

The value of the $$j^{th}$$ paleointensity estimate, where $$j=1$$ to $$N$$.

 

Statistic: $$m$$
Report to 1 d.p.

The arithmetic mean of the $$N$$ paleointensity estimates \[ m=\frac{ \sum\limits_{j=1}^{N}{B_j} } {N}. \]

 

Statistic: $$s$$
Report to 1 d.p.

The standard deviation of the $$N$$ paleointensity estimates. \[ s=\left( \frac{ \sum\limits_{j=1}^{N}{(B_j-m)^2 } }{N-1} \right)^{\frac{1}{2}} \]

 

Statistic: $$m_w$$
Report to 1 d.p.

The weighted mean of the $$N$$ paleointensity estimates. \[ m_w=\frac{\sum\limits_{j=1}^{N}{W_jB_j}}{\sum\limits_{j=1}^{N}{W_j}}, \] where $$W_j$$ is the weight on the $$j^{th}$$ paleointensity estimate.

 

Statistic: $$s_w$$
Report to 1 d.p.

The weighted standard deviation of the $$N$$ paleointensity estimates (Heckert and Filliben, 2003). \[ s_w=\left(\frac{N\sum\limits_{j=1}^{N}{W_j\left(B_j-m_w\right)^2}}{\left(N-1\right)\sum\limits_{j=1}^{N}{W_j}}\right)^{\frac{1}{2}} \]

Useful Note...
Several options are available to act as weights. Two options that have been used in the literature are the quality and weighting factors, $$q$$ and $$w$$, respectively. Their use as weighting factors, however, are not appropriate. As is outlined in Section 3, $$w\propto{}q$$, which itself is a function of the Arai plot slope ($$\left|b\right|$$). Hence, both $$q$$ and $$w$$ are proportional to the paleointensity estimate. If $$q$$ or $$w$$ are used as a weight ($$W_j$$) then $$W_j\propto{}B_j$$ (i.e., higher paleointensity estimates will tend to have larger weights), which can bias the weighted mean to higher values. Such dependencies should be carefully considered when deciding on the choice of which statistic to use for weighting.

Measures of scatter

Statistic: $$\delta{B}(\%)$$
Report to 1 d.p.

The standard deviation as a percentage of the mean value. Often referred to as the scatter. \[ \delta{B}(\%)=\frac{s}{m}\times100 \]

 

Statistic: $$\delta{B_N}(\%)$$
Report to 1 d.p.

When dealing with small numbers of data (i.e., small $$N$$), both $$m$$ and $$s$$ are inherently uncertain and these uncertainties propagate into measures of scatter. To account for this, Paterson et al. (2010a) proposed an adjustment to $$\delta{B}(\%)$$ to determine the upper 95% confidence interval ($$\delta{B_N}(\%)$$). Using this approach we can say, with 95% confidence, that the true scatter of the data is less than $$\delta{B_N}(\%)$$. This allows for a fairer comparison of data sets with different $$N$$. \[ \delta{B_N} (\%)=\left|\frac{\sqrt{N}}{ t_{nc_{ \left(1-\alpha;~(N-1);~\frac{m\sqrt{N}}{s} \right) } } }\right|\times{100}, \] where $$t_{nc}$$ is the noncentral $$t$$ critical value for the $$(1-\alpha)$$ confidence level for $$(N-1)$$ degrees of freedom and with noncentrality parameter $$\frac{m\sqrt{N}}{s}$$.

Numerical Tip...
Different software packages use different conventions for the input of $$(1-\alpha)$$ into the calculation of the noncentral $$t$$ critical value. For example, the MATLAB command nctinv() takes $$\alpha = 0.95$$, while other function may use $$\alpha = 0.05$$. For $$N-1=1$$ and a noncentrality parameter of unity (1) the noncentral $$t$$ critical value at the 95% confidence level is -1.193.

Statistical tests for scatter

Statistic: $$p_{\delta_{B}}$$
Report to 3 d.p.

An alternative approach is to determine the probability that the scatter (i.e., $$\delta{B} (\%)$$) is less than some critical value, $$\delta{B_{max}}$$ (Paterson et al., 2010a). By adopting this approach, selection based on scatter can be performed as a statistical test, whereby we test the null hypothesis that our measured scatter is less than or equal to $$\delta{B_{max}}$$. The probability $$p_{\delta_{B}}$$ that this is the case is given by \[ p_{\delta_{B}}=F\left(\frac{\sqrt{N}}{\delta{B_{max}}};~(N-1)~;~\frac{m\sqrt{N}}{s} \right), \] where $$F()$$ is the noncentral $$t$$ cumulative distribution function and $$\delta{B_{max}}$$ is given as a fraction and not a percentage (e.g., 0.25 as opposed to 25%). If $$p_{\delta_{B}} \leq 0.05$$ we cannot reject the null hypothesis that our measured scatter is less than or equal to $$\delta{B_{max}}$$ (at the 5% significance level). If, however, $$p_{\delta_{B}} > 0.05$$ we can reject the null hypotheses and our measured scatter is most likely greater than $$\delta{B_{max}}$$. The two outlined approaches are identical, with $$\delta{B_N} (\%)$$ being the value of $$\delta{B_{max}}$$ that yields $$p_{\delta_{B}}=0.05$$.

Statistic: $$p_{s}$$
Report to 3 d.p.

Some studies prefer to select data using an absolute limit on the standard deviation ($$s_{max}$$) of an average paleointensity estimate, most notably when the estimate is low and the relative scatter may therefore be high. Given that, under the assumption of normality, estimated variance follows a chi-squared distribution the probability that $$s$$ is less than or equal to $$s_{max}$$ is given by \[ p_s=F_{\chi^2}\left(\frac{(N-1)s_{max}^2}{s^2};~(N-1)\right), \] where $$F_{\chi^2}()$$ chi-squared cumulative distribution function with $$N-1$$ degrees of freedom. If $$p_{s} \leq 0.05$$ we cannot reject the null hypothesis that our measured scatter is less than or equal to $$s_{max}$$ (at the 5% significance level). If, however, $$p_{s} > 0.05$$ we can reject the null hypotheses and our measured scatter is most likely greater than $$s_{max}$$.
It should be noted that, in cases where $$\delta{B_{max}}=\frac{s_{max}}{m}$$, $$p_s$$ is always less than $$p_{\delta{B}}$$. This is due to fact that $$p_{\delta{B}}$$ accounts for sample size related uncertainty in both $$m$$ and $$s$$, but $$p_s$$ accounts for sample size uncertainty in only $$s$$.

 

↑ TOP