Skip to content

Eurordis - Rare Disease Europe

22/02/2018

Lesson 5: Statistical Tests


Please go through the lesson, making sure to click the green ‘completed’ button once you have read through the content.

Lesson

Comparative Scheme

Randomised Controlled Trials (RCT) can be used to provide sets of data that can be compared.

In a population, a group of eligible patients is randomised to one of the compared treatments (A or B).

Following a predefined duration of treatment the two groups are compared according to the chosen endpoint of the study.

Due to the randomisation of the treatments and the blinding of treatments, the difference observed between groups may be attributed to the allocated treatments.

Tests According to the Variable

The comparison occurring at the end of a study is based on a statistical test. According to the type of data, numerous tests have been developed.

Let’s have a look at three types of statistical tests: the Student t-test, the χ2 test (chi2), and the Survival analysis.

  • For the comparison of continuous variables, the Student t-test, the first to be developed, is still commonly used.
  • For the comparison of binary variables the χ2 test (chi2), with variations according to the situations and the number of data, is the most frequent.
  • In spite of its name, the ‘survival analysis’, may be used for any comparison in the evolution of a binary variable across the time (e.g. discharge from hospital following an acute disease.

Continuous Variables: Individuals and Means

In a phase II study, two formulations (A & B) of the same anti-hypertensive agent are compared.

 

On one hand there is a large overlap in the two individuals’ values, on the other hand, the Confidence Intervals seem distinguished. Therefore, prior to developing A rather than B, the investigators ask the statisticians if there is a true difference between the means.

Student t-test

The statisticians respond that they can only reply to the question, “Is the observed difference”:

  1. “A) probably due to the random fluctuations of sampling observed in any heterogeneous population (hypothesis 0)?”

Or

  1. “B) probably linked to the treatments (rejection of hypothesis 0)?”

According to the data, the statisticians use the Student t-test and provide the results:

p<0.02 (t = 2.54  d.f. = 104),

p the probability that the observed difference would be only due to random (null hypothesis)

t is the result of the calculation the higher t, the weaker the random hypothesis,

d.f. degrees of freedom corresponds  to the number of values in the final calculation that are free to vary,

Usually, the limit of p<0.05 (5%) is considered statistically significant.

So, p<0.02 signifies that the probability that the observed difference would only be due to a random effect is less than 2%.

The difference between the two antihypertensive agents may be considered statistically significant and taken into account.

Unsurprisingly, the determination of the p-value, based on the t and d.f. values, is part of the Gaussian toolbox.

Binary Variables

Some variables are binary (survivor/non-survivor; hospitalized/discharged). In other situations, a threshold may be chosen for a continuous variable, dividing the population into 2 subpopulations, over vs under the limit. For example, in order to compare the two anti-hypertensive formulations, another endpoint would be normalisation of the blood pressure defined as >13 cm Hg.

At the end of the study, among the 96 participants, 41 have Blood Pressure under 13cm Hg (24 with A and 17 with B).

The question asked of the statisticians is:

Is the observed difference between [arms]:

  • due to the random fluctuations observed in a heterogeneous population (random possibility called hypothesis 0)?

or

  • linked to the different treatments

Binary Variables χ2 (CHI square) test

 

The χ2 test name is linked to the shape of the Greek  letter χ which graphically illustrates the concept of crossed comparison.  The test passes when one arm of the cross prevails in terms of number of subject.

Survival Analysis

In some situations, it is interesting to follow the evolution of the endpoint among the participants. The first example of endpoint was the survival which gave its name to this type of analysis: Kaplan Meyer, or survival curves.

Each change in both groups is recorded at its time of occurrence. These methods consider the difference between arms globally on the whole period of study, rather than at a given time.

In the example above, treatments A and B  are compared in treatment of prostate cancer. Survival curves describe the outcome in the two arms.

Outcomes may be compared:

  • at a given time, a comparison done at 3.5 years would reject difference, while it would be significant at 4 years.
  • for a given survival rate, the median survival time is 5.8 years with A vs 3.8 years with B.
  • globally, the survival analysis avoids partial conclusions, which is especially important when the difference between treatments fluctuates

The calculation is done like a succession of  χ2 tests performed at the time of each event, during the whole period of analysis.

By pooling all these observations, this method takes into account a larger amount of information compared to a single analysis at a given time, resulting in an increased power of conclusion

 

Basic vocabulary: Words you should know

Chi2 test – Parametric test – Survival analysis – Statistical test – t test

Technical Vocabulary: Important words to know

Alpha error – Alternative hypothesis- Bias – Beta error Binary variable – Continuous variable – Null hypothesis

Advanced Vocabulary: Useful words to know

Analysis of Variance (ANOVA) – Analysis plan – Confounding variable -Correlation – Global assessment variable – Hypothesis – Interaction (qualitative and quantitative) – Logistic regression

 

Case study

Binary vs Continuous Variable

Two Approaches in Testing

The Lesson section of this unit presents the results of two ways of evaluating the efficacy of a novel anti-hypertensive agent (A) vs a standard treatment (B) in patients with mild arterial hypertension

By comparing the former results of the comparison of 2 anti-hypertensive agents, there is an apparent discrepancy between the two whereas the data results from the same trial!

  • End point 1 for testing anti-hypertensive efficacy: mean value of systolic arterial pressure following a 3-month treatment.
  • End point 2 for testing anti-hypertensive efficacy: Percentage of patients in which the systolic arterial pressure less than 13 cm Hg following a 3-month treatment

Q1: Among these 2 endpoints, which one is continuous, which one is binary?

Q2: Which statistical test is reliable respectively for testing endpoints 1 and 2 ?

Variables, Tests and Results

End point 1 for testing anti-hypertensive efficacy: mean value of systolic arterial pressure following a 3 month treatment.

  • The mean value of systolic arterial pressure is a continuous variable (the response is a graduable value: the comparison can be performed using a Student t test.

End point 2 for testing anti-hypertensive efficacy: Percentage of patients in which the systolic arterial pressure less than 13 cm Hg following a 3-month treatment.

  • The percentage of patients in which the systolic arterial pressure less than 13 cm Hg is a binary variable (the response is Yes or No): the comparison can be performed using a Chi 2 test

Results and statistical testing are as follow:

  • Endpoint 1

Treatment A 13.1 ± 0.27 cm Hg (n=49)

Treatment B 14.2 ± 0.34 cm Hg (n=47):

p<0.02

  • Endpoint 2

Treatment A 51% (n=49)

Treatment B 35% (n=47):

p=0.11, Not Significant,

The use of the endpoint 2 seems to result in a lower ability to conclude.

Q1: Is this difference in statistical power surprising? And if so, why?

Q2: Taking into account these statistical aspects, which endpoint will you use for comparing anti-hypertensive agents?

Statistical Power 

Q1: Is this difference in statistical power surprising? And if so, why?

Q2: Taking into account these statistical aspects, which endpoint will you use for comparing anti-hypertensive agents?

  • Endpoint 1 for testing anti-hypertensive efficacy: mean value of systolic arterial pressure following a 3-month treatment.

Treatment A 13.1 ± 0.27 cm Hg (n=49)

Treatment B 14.2 ± 0.34 cm Hg (n=47):

p<0.02

  • Endpoint 2 for testing anti-hypertensive efficacy: Percentage of patients in which the systolic arterial pressure less than 13 cm Hg following a 3-month treatment.

Treatment A 51% (n=49)

Treatment B 35% (n=47):

p=0.11, Not Significant,

  • The data used shows the actual value of systolic arterial pressure for endpoint 1.
  • The comparison of this value with a given threshold for endpoint 2.
  • The value allows us to locate (more or less than 13cm Hg) conversely the category (<13 vs >13) does not provide the individual value.
  • In other terms, the individual value content more information than the category.
  • The quantity of information is a determining factor of statistical power. So great a difference in statistical testing is not really surprising.

The “Best” Endpoint

Endpoints 1 and two 2 do not respond to the same question

  • Endpoint 1 responds to:

Is the treatment able to decrease systolic arterial pressure overall in a given population?

In such an average approach, the decrease of 1cm Hg in four persons is equivalent to a decrease of 4 cm Hg in one person, and the decrease from 18 cm Hg to 16 cm Hg is equivalent to the decrease from 15 to 13 cm Hg.

  • Endpoint 2 responds to:

Is the treatment able to reach an individual clinical target: to lower the arterial pressure below a threshold defined as pathological?

This approach does not distinguish between a modification from 18 to 12 vs 14 to 12, as successful. Nor does it distinguish between a change from 18 to 14 vs 15 to 18 as unsuccessful.

The endpoint has to be chosen according to the more relevant question.

Endpoint 1 may be preferred to show the pharmacological activity of a compound in hypertensive patients.

Endpoint 2 may be preferred to test a new drug as first-line treatment for controlling arterial pressure below a critical value.

 

Documents for further reading

T Test Online Calculator

Compare continuous data sets yourself Tools for Science www.physics.csbsju.edu

CHI 2 Test Online Calculator

An interactive calculation tool for chi-square tests of goodness of fit and independenceKristopher J. Preacher, University of Kansas Department of Psychology

Clinical Trials and Gender

The inclusion of Women in Clinical Trials: Are We Asking the Right Questions? Lippman, A. (2006). Toronto, ON: Women and Health Protection.

A “Student” Named Gosset

An inspiring student Boland, Philip J. University College Dublin, School of Mathematical Sciences