Guide to Statistics: Probability & Statistics Facts, Formulae and Information

Print this page

The statistical problem solving cycle

Data are numbers in context and the goal of statistics is to get information from those data, usually through problem solving. A procedure or paradigm for statistical problem solving and scientific enquiry is illustrated in the diagram. The dotted line means that, following discussion, the problem may need to be re-formulated and at least one more iteration completed.

Descriptive statistics

Given a sample of $$n$$ observations $$x_1, x_2, \ldots, x_n$$ we define the sample mean to be \[\bar{x}=\frac{x_1+x_2+\ldots+x_n}{n}=\frac{\sum{x_i}}{n}\] and the corrected sum of squares by \[s_{xx}=\sum{\left(x_i-\bar{x}\right)^2}=\sum{x_i^2}-n\bar{x}^2=\sum{x_i^2}-\frac{\left(\sum x_i\right)^2}{n}\] $$\frac{s_{xx}}{n}$$ is sometimes called the mean squared deviation. An unbiased estimator of the population variance, $$\sigma^2$$, is provided by calculating $$s^2=\frac{s_{xx}}{(n-1)}$$. The sample standard deviation is $$s$$. In calculating $$s^2$$, the divisor $$(n-1)$$ is called the degrees of freedom (df). Note that $$s$$ is also sometimes written $$\hat{\sigma}$$.

If the sample data are ordered from smallest to largest then the:

These five values constitute a five-number summary of the data. They can be represented diagrammatically by a box-and-whisker plot, commonly called a boxplot.

A boxplot

Grouped frequency data

If the data are given in the form of a grouped frequency distribution where we have $$f_i$$ observations in an interval whose mid-point is $$x_i$$ then, if $$\sum{f_i}=n$$ \[ \bar{x}=\frac{\sum{f_i x_i}}{\sum{f_i}}=\frac{\sum{f_i x_i}}{n}\] and \[s_{xx}=\sum{f_i\left(x_i-\bar{x}\right)^2}=\sum{f_i x_i^2}-\frac{\left(\sum f_i x_i\right)^2}{n}\]

contents

close this window