Skip to content

Eurordis - Rare Disease Europe


Lesson 3: Variability in Observations


The variability in an observation results from the amount of heterogeneity in a population, the biological variance, but also from the conditions of measurement, the experimental variance.

These two types of variability are considered differently by statisticians although they may not be differentiated in the results of a given observation.

The Many Faceted Biological Variability

Among the factors influencing weight, the analysis of the gender factor would show that:

  • the mean weight of females (68.3kg) differs from the mean weight of males (79.4 kg).
  • the variability is less in each sub population (SD = 12.7 kg in females, SD = 15.1 kg in males), compared to the global population (SD = 18.1kg).

These two statements show that part of the overall variance is linked to the gender factor.

Similar analyses according to sub populations could be done by taking into account the height, the body weight at birth, etc.

Body weight depends on:

  • the weight of both parents,
  • nutritional habits,
  • physical activity
  • lifestyle,
  • gender,
  • height, etc….

Weight in the global population 74.1 ± 18.1 kg

Weight in females and males:

Experimental Variability

The results were obtained by using electronic scales integrated in the floor of passage ways. The results of measurement may be influenced by the quality of the scales, the momentum of the people when passing on the scale,  the day of measurement, etc. These causes of experimental variability are not linked to the parameter itself, but to the conditions of its measurement. Whereas the biological variability is inherent to the heterogeneity in the living world, the experimental variance is an unwanted background (called noise) mixed with biological variance in the expression of the results. Therefore, efforts must be made to limit the experimental variance.

In order to limit the experimental variance caused by the momentum factor in the weighing, the statisticians suggested moving the scales to a place where the passengers are not in motion, in front of the ticket machine.

Again, following one day of measurements the results were:

Weight : 73.8 ± 16.9 kg (n=412).

The reduction of SD (from 18.1kg to 16.9 kg) suggests that the momentum factor contributed to the experimental variance.

The unavoidable biological variance may be explained by several factors. The experimental variance has to be reduced as a factor of uncertainty.

Accuracy of Evaluations: SEM 

These new results differ in the variability but also slightly for the mean. The subway’s representatives ask the statisticians why this is the case:

It is impossible to measure all individuals, and due to variability of the population, you will never obtain exactly the same result twice. We can accept some approximation, but to what extent? We have to order lifts and the manufacturer needs to know the mean weight of our users. Fortunately, the Gaussian toolbox contains another tool: the Standard Error of the Mean (SEM), used to measure accuracy of the mean.

Its formula is: SEM =SD/n

  • SD = Standard Deviation
  • n  = number of data

With only 192 measurements,

  • M ± SD  was 75.9 ± 16.6 kg

  • the Metroda managers decide to acquire more data.
  • With a new set of 810 data
  • M ± SD  was 73.8 ± 16.7 kg, and

  • Once again, the Metroda managers decide to acquire more data.
  • So, with another set of 3 236 data collected in 2 weeks
  • M ± SD  was 74.1 ± 16.4 kg 

The more numerous the data, the better known the population, and the more accurate the determination of its mean.

Choice of the Parameter

A nutritionist in the city comes across this information:

Weight =74.1 ± 16.6 kg (n=3 185).

Her first reaction is:

– 3 185 measurements! This device is a fantastic observatory of the body weight!

She asks Metroda:

May we obtain your data?

As Metroda agrees, a young but rigorous student statistician from the nutrition team claims:

– There is a Measurement of Bias concerning the body weight! The users are weighed with clothes and shoes, all these data are over estimated!

Metroda responds:

– In our subway networks, the users are not invited to undress for using lifts. Our method for weight measurement is perfectly adapted to our need.

Some days after, the senior nutritionist statistician contacts Metroda:

– Sorry about our former comments, your data is very interesting. We calculated the mean of the weight of clothes and shoes  (on average 3.4 kg in summer and 4.5 kg in winter).

So, if you agree to provide us regularly with your data, we will be able to monitor the body weight of our fellow citizens, especially to evaluate the impact of the nutrition educational campaigns.

The choice of the parameter and its method of measurement closely depends on the question. A bias of measurement may be damaging for conclusions especially when ignored!


Basic vocabulary: Words you should know

Sampling – Standard deviation – Variance

Technical Vocabulary: Important words to know

Analysis of Variance – Correlation – Standard Error of the Mean – Confidence interval

Advanced Vocabulary: Useful words to know

ANOVA – Confidentiality – Confounding variable – Dependent variable – Independent variable – Percentile – Q1, Q2, Q3 – Quartile


Case study

Standard Deviation and Standard Error of the Mean

In order to evaluate the body weight in a female population, 3 pilot measurements are performed using small samples. The mean evaluated in three samples varies from 65.2 to 70.5 kg (on the left, in blue). This poor accuracy, also shown by large Standard Error of the Mean (SEM) values is observed for each evaluation (on the right, in red). However, let’s see what happens when the samples are 4 times bigger.


With about 150 person-samples, i.e. 4 times larger, the range of observed means narrows:

68.2kg to 69.1kg i.e. 0.9 kg compared to 2.7 kg previously. The shape of observed distributions is closer to the typical bell shaped Gaussian curve. The estimation of variability (SD ranging from 12 to 13.1) does not differ largely from the previous evaluations but SEM is about 2 times smaller.


With 750 measurements, the 3 evaluations are very close (68.0 to 68.6 kg) a range representing less than 1% of the mean value.

The larger the sample size, the less the variability between the observed means, i.e. the more accurate of the evaluation of the parameter of the whole population.

If we compare the smallest samples size (n = 41, 29 and 46) and the largest (n= 2936, 2973 and 2938), we can see that as the sample size (n) grows, the variability between the observed means decreases.  i.e. the more accurate of the evaluation of the parameter of the whole population.


Documents for further reading

First Steps in Variability

Range and quartiles Statistics Canada


Misuse of standard error of the mean when reporting variability of a sample. P. Nagele Br. J. Anaesth. (2003) 90 (4): 514-516


Squares and Square Roots