Skip to content

Eurordis - Rare Disease Europe


Lesson 4: Sampling



An essential characteristic of clinical research is its multi-step process:

1) Posing the right question and selecting the suitable parameter

2) Acquiring data by observing a limited number of participants

3) Applying the results of the observations to a large population, possibly resulting in changes in clinical practice

In order to be fruitful, research requires:

1) Internal quality, which involves avoiding measurement bias and limiting the risk of error. The measurement must reflect the parameter of interest (i.e. avoiding any measurement bias), and the observation has to be accurately established, in order to be confidently considered.

2) External quality which involves ensuring that the participants are representative of the population of interest.

Internal Quality 

Measurement bias:

Measurement bias is a systematic measurement error.

It may result from:

  • poor calibration of an apparatus.
  • unsuitable conditions for data collection (i.e. over-estimated blood pressure by measurement in a stressful environment, under-estimated alcohol use through self-administered questionnaire).
  • a shift in the measured parameter (height measured with shoes).

Contrary to variability in measurement, measurement bias is not easily detected through calculations of the results.

Fortunately, once it is identified, it can usually be partially corrected.

Therefore it is crucial to carefully track down this possible error in methodology

Risk and Confidence

It is not the primary role of statisticians to provide an opinion on what is acceptable or questionable. Their role is to evaluate probability.

The concepts of confidence in an observed measurement or of significance in an observed difference, and conversely of risk in claiming results are essential in statistics.

Statistics evaluates the probability, the level of risk, that a statement could be true or false.

Then, the acceptance or rejection of a statement depends on the acceptance or rejection of a given level of risk of error.

Confidence Interval

The third tool in the Gaussian tool box, after the Standard Deviation and the Standard Error of the Mean, is the Confidence Interval (CI). A confidence Interval is a range around a measurement that conveys how precise the measurement is. It is used to evaluate the reliability of an estimate.

So if we return to our subway lift example: weight = 73.90 ±  0.24 kg (M ± SEM):

A Confidence Interval always defines the desired level of confidence (xx% CI). A 95% CI, the level often chosen, corresponds to an accepted 5% risk of error. The higher the desired level of confidence, the wider the range of estimation of the mean.

External Quality


Even when well-established (unbiased and accurate), the results obtained in a given sample, can not always be generalised. The ability to make generalisations depends on the representativeness of the participants.

A lack of representativeness of sample, called sampling bias or recruitment bias, may result from:

  • Unknown reasons, linked to a small sample size in which demographic aspects such as age, ethnicity, gender, socio-economic level etc. of the global population are rarely represented
  • Unsuitable recruitment methods in the study.

In order to limit the bias of measurement of body weight, our rigorous statistics student proposed the setting of electronic scales in the floor of the university swim pool due to the negligible weight of bath clothes. Unfortunately, its users, mainly young sportswomen and sportsmen do not represent the global population. These results would be interesting for swimsuit manufacturers but not for health-policy decision makers.

The risk of sampling bias may be limited by a random selection of the participants among the population and by avoiding very small samples.

Internal vs External Quality

Because the Confidence Interval depends on the observed variance of the parameter, it may be tempting to select participants explicitly through strict Criteria of Inclusion or not, aiming to limit the heterogeneity in the considered sample. Sometimes, questions of general interest are tested in samples that do not reflect the general population. E.g Caucasian males from 18 to 25 years old. Unfortunately, these results, even when of high internal quality, applied only to Caucasian males from 18 to 25 years old: i.e. have a poor external quality. This kind of approach may be justified in early phases of clinical research, for example, to establish the proof of concept of a possible mechanism of action of a new compound. A negative response would reject the biological hypothesis (for this limited population!). In the case of a proven concept, further developments need to diversify the participants for application of the results to the global population.


Basic vocabulary: Words you should know

External quality – Inclusion criteria – Selection bias – Treatment Group – Randomisation

Technical Vocabulary: Important words to know

Arm – Bias Latin square – Confidence interval  – Internal quality –  Stratified sampling – Universality

Advanced Vocabulary: Useful words to know

Distributive Justice – Extrapolation of foreign  clinical data – Historical Control – Inductive reasoning – Inference – Paired data – Post-hoc analysis – Randomised Trial – Recruitment


Case study

Avoiding Sampling Bias

The impact of sodium intake on arterial pressure remains controversial. The public health department of a university would like to make the students aware of this problem. Three teams of medical students are asked to perform measurements in the campus population. Their results of systolic arterial pressure are as follows:

Discussed in a student assembly, these results generated the following statements:

  • The results of Team B are less accurate because they do not provide a decimal place
  • The results of Team C are more accurate because they are the least variable 
  • The results of Team B are more accurate because of the higher number of measurements
  • A mean value does not make sense because individuals are different
  • The device of team C is probably badly calibrated resulting in lower values

What do you make of these comments? And what is the true mean value of mean systolic arterial pressure in the campus?

We don’t know the true value a statistics student said, but we do know what is probably not the true value. For each team, I calculated the 95% Confidence Interval (95%CI). This is the range out of which the mean value of the whole population is calculated with only a probability of less than 5%.

The 95% CI takes into account both the variability and the number of measurements: the higher the sample, the narrower its range.

By measuring in a 4-fold larger sample you will reduce the range by half. I suggest each team performs about 500 measurements he added.

  • It will be difficult to find 500 students the manager of team C said.

Why only students? a statistics student asked,

Remember your aim: to investigate the whole population of the campus, including teachers, and each worker living and eating here!

  • But previously, we only measured students’ arterial pressure, team C added.

That’s a typical mistake in experiments, said the statistics student.

It’s called Sampling Bias. You selected a subgroup of the population, one that is younger and more homogeneous and so you obtained a lower mean value with a reduced variability.

  • The only good news is that we are no longer suspected of badly measuring the arterial pressure.

Yes, in such cases, called Measurement bias, e.g. linked to a bad calibration of a device, the mean would be shifted, but usually, the variability is not modified by this systematic bias.

A second campaign of measurements is launched:

By avoiding a sampling bias and using larger samples, we observe that:

  • the results of the 3 teams closely converge (around 125 mm Hg)
  • the range of uncertainty is narrowed, from 122.6 to 128 mm Hg according to the teams.

By conducting only one large study with all measurements, the 95% CI would be [124.2 126.4] mm Hg.

Some comments about the initial remarks:

  • Fortunately, the accidental discussion on methods gave us an explanation for the surprising initial results of team C. Even when a sample may randomly differ from the others, a measurement bias or a sampling bias has to be systematically considered and sought out.
  • As a rule, the larger the sample, the more accurate the evaluation.
  • Initially, team A provided results with 2 decimals. The calculation of the Confidence Intervals reveals that the eventual uncertainty was about 8 mmHg. The use of (decimal) hundredths was therefore not useful and perhaps a bit silly.
  • A mean value does not describe each individual but may be an essential milestone for further comparison in the evaluation of new provision treatments. 


Documents for further reading

From Sample to Population

External validity William M.K. Trochim, Web Center for Social Research Methods

Track the Biases

Selection bias Robert T. Carroll A collection of strange beliefs, Amusing deceptions, and Dangerous Delusions The skeptics dictionary From Abracadabra to Zombies

Confidence Interval

What is a Confidence Interval? Stat Trek – teach yourself statistics