Evidence-based medicine and clinical trials - from a clinical trials unit perspective
Natalie J. Ives
Birmingham Clinical Trials Unit, University of
Birmingham, Birmingham, UK
Address for correspondence:
Natalie Ives,
Birmingham Clinical Trials Unit,
University of Birmingham,
Park Grange, 1 Somerset Road,
Edgbaston, Birmingham B15 2RR, UK
Tel: +44-(0)121-687-2324 Fax: +44-(0)121-687-2313
Email: n.j.ives@bham.ac.uk
Abstract
It is a challenge to clinicians and policy-makers to
determine which treatments are both efficacious and
cost-effective. Evidence-based medicine is an essential
part of healthcare policy and includes the processes of
systematically identifying clinical evidence, appraising
it critically and acting on the evidence of treatment
effectiveness. However, interpreting the evidence is
not straightforward and requires an understanding of
statistics, clinical trials and meta-analysis. The aim of this
paper is to describe the importance of evidence-based
medicine from the viewpoint of clinical trialists and
statisticians.
Introduction
Evidence-based medicine (EBM) has been defined as the
"conscientious, explicit, and judicious use of current best
evidence in making decisions about the care of individual
patients", which is achieved by "integrating individual
clinical expertise with the best available external clinical
evidence from systematic research" [1]. EBM forms an
essential part of healthcare policy, with clinicians
required to assess a large evidence-base to support their
clinical decisions on how best to treat their patients.
EBM includes the process of systematically identifying
the appropriate evidence, appraising it critically and
then synthesising and acting on the evidence. However,
interpretation of the evidence is not straightforward:
randomised controlled trials (RCT) are the gold standard
for the comparison of treatments in a clinical setting,
but it is not always easy to determine their quality,
validity and relevance. Expert reviews and meta-analyses
are additional sources of information on treatment
efficacy and clinicians also need the skills to assess these
in a reliable and un-biased manner.
EBM and Critical Appraisal of Clinical Trials
RCTs are the most reliable method for assessing
treatment and non-randomised studies should be
avoided. Non-randomised trials often lead to false positive results, and the treatment effect may be
exaggerated in uncontrolled non-randomised studies [2]. A paper reporting the comparison of two treatments
for acute myeloid leukaemia in the elderly illustrated
the problem of using historical controls. The induction
death rate for the 3-drug SAB regimen was found to be
significantly lower than with the standard treatment
of DAT (using historical controls) (15% versus 30%;
p=0.00007) [3]. However, the SAB regime was ‘Same as
Before', meaning that both groups of patients received
the same treatment (i.e. DAT). How can there be a
significant difference between two groups receiving the
same treatments? Possibly clinicians became better at
managing DAT treatment with more experience (i.e. they
managed the side-effects better) or supportive care may
have improved over time, so that the outcome with DAT
was significantly better than previously. Regardless of the
precautions that are taken, comparisons using historical
or other non-randomised controls are always likely to
be subject to moderate biases, the exact size of which
cannot be predicted reliably [2,4,5].
Well-designed and properly executed RCTs are the gold
standard for comparing treatments, but trial results
should not be taken at face value without critically
appraising the quality, validity and relevance. There are
a variety of tools available to aid the clinician in this,
including a critical appraisal tool adapted from two
papers by Guyatt that provides a useful guide with ten
systematic questions designed to assist in the appraisal
process [6,7].
1. Is the study well designed?
2. Has the study been analysed correctly?
3. What are the results and have they been
interpreted correctly?
4. Will the results help you locally?
Is the study well designed?
Is the study design appropriate and will it provide
a reliable answer? Most trials use a parallel-group
design where one or more treatments are compared to
placebo/control or standard therapy and each patient
receives only one of the study interventions (between
group comparison). However, other trial designs may
sometimes be more appropriate. In crossover studies,
patients receive each study intervention in successive
periods with the sequence of treatments determined
at random (within group comparison), and because
participants act as their own control, this design requires
fewer patients than a parallel-group design. However,
crossover trials are not always appropriate as they
can only really assess short-term treatment effects in
patients with stable and/or chronic diseases. In factorial
trials, two or more interventions are evaluated separately,
in combination and against a control. This design has
been under-used in the past, but can be very efficient
and allows the investigation of possible interactions
between treatments, which is not possible
in other trial designs.
Most trials are designed to determine efficacy based
on observing a pre-defined difference between the
two treatments. However, new treatments are often
advocated with claims of equal effectiveness, but
with fewer (or less severe) side-effects or better costeffectiveness.
In this case, the aim is to show that
an experimental treatment is either equally effective
(equivalence trial) or not worse (non-inferiority trial)
than the active control. When designing equivalence or
non-inferiority trials, a maximum allowable difference
(or equivalence margin) needs to be specified.
In equivalence trials, the two treatments are considered
equivalent if the observed treatment difference is no
greater (in either direction) than this equivalence margin.
In contrast, non-inferiority trials aim to show that an
experimental treatment is not worse than an active
control by more than the equivalence margin, with an improvement of any size fitting in with the definition
of non-inferiority. In reality, it is very difficult to prove
that two treatments have exactly equivalent treatment
effects, and these types of trials generally require larger
sample sizes, so that equivalence or non-inferiority can
be established with sufficient confidence. Furthermore,
in trials aimed at detecting a pre-defined difference
between treatments, failure to show a difference does
not mean that the two treatments are equivalent.
The trial report should provide information on how
the sample size was calculated [8]. Generally, trials are
designed with the power set at either 80% or 90%,
with 90% power meaning that if a real difference of
the anticipated size exists, the probability of finding
a significant difference between the treatments, with
the given sample size, is 0.9. It is important to consider
the magnitude of the difference that the trial is aiming
to detect and whether the study is recruiting enough
patients to answer the question reliably. Problems arise
when there is an over-optimistic expectation about
the likely treatment effect - the key question is then
whether the possible treatment effect is a moderate
one that is still worth knowing about, or if it is too small
to matter. The medical literature is littered with trials
that were too small to answer reliably the question
of interest [9]. For example, in 55 trials of tamoxifen
versus placebo for early breast cancer, only 6 studies
reported a statistically significant survival benefit with
tamoxifen. However, when the data from all 55 trials
were combined in a meta-analysis, the results showed
that tamoxifen reduces the risk of death by nearly 15%
(p<0.00001) (Figure 1) [10]. How can the evidence for the
benefits of tamoxifen be so convincing if only 6 trials
reported a statistically significant survival benefit?
It has been suggested that the anticipated benefit
from tamoxifen was unrealistically large, meaning that
most trials were under-powered to detect the smaller,
but clinically important, benefit of tamoxifen which
became apparent when all the data were combined in
a meta-analysis.
Although randomisation is vital in any trial, randomisation alone is not enough, and concealment of allocation is also important (i.e. the clinician should not be able to predict the next treatment allocation) [11,12]. If there is any chance that the clinician can guess the treatment that the next patient will receive, then the decision to enter a patient into the trial may depend on the perceived treatment they would receive, which may result in systematic differences in the type of patients selected for one treatment rather than the other (selection bias) [13]. For these reasons, randomisations based on date of birth or day of the week are seriously flawed. The treatment effect may also be exaggerated, with larger treatment effects reported from trials without adequate concealment compared to adequately concealed trials [14].
The blinding of the study (i.e. did the patient and/ or clinician know what treatment the patient was receiving?) is another important consideration especially in trials where the outcome could be influenced by the knowledge of the treatment (e.g. quality of life endpoints). Larger treatment effects have been found in non-blinded studies in comparison with their doubleblind equivalent [14,15]. There is less need for blinding in trials where the outcome is not subjective (e.g. disease recurrence or death), and in some situations blinding may be difficult (e.g. surgery trials). Non-blinded studies should not however be regarded as poorer quality, but it is important to consider whether the outcome could potentially be biased by the patient (and/or clinician) knowing what treatment they are receiving (measurement bias).
1. What is the aim of the study?
2. Is the study large enough? Is the treatment effect realistic?
3. Is the randomisation procedure robust?
4. Is the study blinded? Is blinding necessary?
5. Are the intervention and comparator treatments appropriate?
6. Are the outcome measures appropriate?
Has the study been analysed correctly?
Having decided that the study is well designed, how can the validity of the results be determined? All trials should be analysed using the intention-to-treat (ITT) method: data on all randomised patients should be analysed according to the treatment allocated, regardless of whether they actually received this treatment or not. The reasons for ITT analysis have been discussed in numerous papers, [16-19] but the main ones are that it minimises the potential for bias, avoids selective exclusion of patients and provides the most reliable assessment of treatment efficacy.
Are all randomised patients accounted for?
Unfortunately, there will always be patients who withdraw from treatment or are lost to follow-up. It is important that this number should be minimised, as there are likely to be systematic differences in the types of patients who remain in the trial compared to those who drop-out (attrition bias) [16,17]. The aim is to get complete follow-up on each patient, but if there are withdrawals, then hopefully the number (and reasons for withdrawal) are similar across each arm. The CONsolidated Standards of Reporting Trials (CONSORT) statement was devised to facilitate evaluation of the validity of a trial's results [20,21]. This checklist and flow diagram is aimed at improving the quality of reporting of RCTs, with trials submitted for publication expected to include the CONSORT flow diagram in the report (Figure 2).
What are the results?
There are various ways in which the treatment effect can be reported (relative risk, odds ratio, odds reduction, hazard ratio, mean change). However, alongside this point estimate for the treatment effect (which is just an average treatment effect), it is important to know how precise it is. Calculation of the confidence interval (CI) for the point estimate provides a measure of certainty and is essential for the assessment of treatment efficacy. The CI gives the range within which the true treatment effect is likely to lie; the wider the CI, the less certain is the estimate of treatment efficacy. The standard is to use 95% CI, which means that the true treatment effect will fall within this range of values 95% of the time (if the experiment was repeated 100 times, the treatment effect would be within this range 95 times out of 100). Therefore, both the point estimate and, more importantly, the corresponding CI are needed to determine treatment efficacy.
1. Was the trial analysed using an intention-to-treat analysis?
2. Are all randomised patients accounted for?
3. Were the data analysed using the correct statistical methods?
4. How are the results reported (i.e. odds ratio, hazard ratio etc.)?
5. How precise are the results? (i.e. are p-values and confidence intervals provided?)
Will results help you locally?
The main purpose of critical appraisal is to determine whether the treatment could be used in clinical practice. Again, it is not sufficient to take the trial results at face value, points to consider include:
- Are the type of patients randomised into the trial the same as the patients that would be treated in practice (i.e. did the trial have broad eligibility criteria and are the trial results generalisable)?
- Can the same treatment (especially in non-drug interventions like physiotherapy or occupational therapy) be provided locally?
- Treatment efficacy versus costs.
Meta-analysis is an evaluation of the totality of the evidence, which is achieved by bringing together all available data from all randomised trials that address the same question in patients with the same disease (i.e. trials of tamoxifen versus placebo in women with breast cancer as discussed earlier). But why perform meta-analyses? Most individual trials are too small to provide reliable answers on their own, and it is impossible to make decisions about treatments based on individual trial results. However, by using metaanalysis and combining data from a number of trials to obtain an overall treatment effect, it means that there is no undue emphasis on any particular study, be it positive or negative. As with clinical trials, the results of meta-analyses should not be taken at face value - the quality of a meta-analysis is dependent on the quality of the trials included in it and publication bias can be problematic [22]. Nevertheless, despite this, metaanalyses are essential for the assessment of treatment efficacy and provide the most reliable and un-biased assessment of the true treatment effect. It is important to note that the results of individual trials are unlikely to change clinical practice. However, by using meta-analysis and assessing the totality of the evidence, clinicians and healthcare policy-makers can make informed and evidence-based decisions about treatments more reliably (e.g. tamoxifen for breast cancer overview (Figure 1)).
EBM and designing a clinical trial - ASTRAL
The development of any trial requires a similar process of reviewing the evidence as that described above. The Birmingham Clinical Trials Unit co-ordinates, the ASTRAL trial which compares angioplasty and/or stent placement with medical treatment for atherosclerotic renovascular disease (ARVD). The trial is jointly funded by the Medical Research Council and Kidney Research UK.
Reviewing the evidence
It is essential to check that the question has not been addressed in a previous trial or by ongoing research. If the question remains unanswered, then previous trials will help in defining the question and designing the study protocol - what were the study designs, what type of patients were included, what were the outcomes, what was the sample size?
A literature search of publications related to ARVD identified 95 potentially relevant articles, which was reduced to 5 RCTs (Figure 3) [23-27]. The trials were all parallel-group designs, with 3 trials of angioplasty versus medical management, 1 trial of surgery versus angioplasty and 1 trial of angioplasty versus angioplasty plus stent. The aim of the two trials comparing surgical interventions [23,26] was to compare patency or restenosis rates. In comparison, the three trials comparing angioplasty with medical management assessed blood pressure response [24,25,27].
The 3 trials comparing angioplasty with medical management showed that blood pressure and serum creatinine were improved in the angioplasty group, but the differences were not statistically significant. A meta-analysis of these trials confirmed these results, and concluded that previous trials were too small: while the combined data "exclude the possibility of a large improvement in renal function or hypertension after angioplasty, a moderate but clinically worthwhile benefit cannot be ruled out" (Figure 4) [28]. Importantly, this metaanalysis and other review articles supported the need for further large-scale randomsied evidence [28-33].
Clinical opinion
Finding that the medical literature supports the need for a trial is not sufficient. It is also important to canvas clinicians to assess their interest in the proposed question (and to aid in defining the question), as without the support of the clinicians who will be randomising patients into the trial, it is unlikely to succeed.
Defining the question
A clear question has several key components:
- What patients are to be included?
- What are the treatments (intervention and comparator)?
- What outcomes should be collected? PICO rule - population, intervention, comparator, outcome(s)
| ASTRAL: PICO Rule of Defining the Question | |
| Clinical problem: | Does revascularisation delay progressive decline in renal function? |
| Study design: | RCT, parallel group design |
| Sample size: | 1000 patients |
| Follow-up: | 5 years |
| Population: | Patients with at least one ARVD lesion suitable for revascularisation confirmed angiographically |
| Intervention: | Revascularisation (balloon angioplasty with or without stent insertion) |
| Comparator: | Medical management |
| Outcomes: | Primary: Renal function Secondary: Blood pressure, renal events, cardiovascular events, death |
The primary outcome in ASTRAL is the mean slope of the reciprocal creatinine plot versus time. The sample size was based on detecting a moderate reduction of 20% in this slope (i.e. reduction from -1.6x10-3 l/µmol/year to -1.28x10-3 l/µmol/year [34]), which gave a sample size (with 80% power) of 750 patients, which was increased to 1000 patients to allow for patient non-compliance and withdrawals.
Patients are randomised into the ASTRAL trial by either a telephone call to the central randomisation office or using the Internet randomisation service, thus ensuring concealment of next treatment allocation. Patients are allocated to either revascularisation or medical management, with the randomisation procedure based on the method of minimisation and stratified by baseline serum creatinine, glomerular filtration rate, percent stenosis, renal length and rate of disease progression [35,36]. Since the trial outcomes (serum creatinine, blood pressure, major events) are not subjective, it was not necessary to blind patients or clinicians to the allocated treatment.
In any long-term trial, there is the potential for new techniques or technology to be developed, which could have a detrimental impact on recruitment. Therefore, it is important that the trial design is pragmatic and adaptable to allow for such developments. Originally patients were eligible for the ASTRAL trial if they had at least one ARVD lesion suitable for revascularisation that was confirmed angiographically. However, during the course of the trial, following advances in the accuracy of imaging techniques, this was expanded so that patients could be randomised into the trial based on ARVD being confirmed by angiography, magnetic resonance angiography (MRA) or computed tomography (CT). Patients entered into the trial based on MRA or CT and who were randomised to revascularisation, were required to undergo angiography prior to the intervention, so that the diagnostic accuracy of these imaging methods could be compared to the gold standard (angiography).
Where are we now?
ASTRAL began recruiting in September 2000, and as of June 2006, 674 patients from 56 centres (including 3 in Australia and 1 in New Zealand) have been randomised into the trial. The trial remains open to recruitment until April 2007, although long-term follow-up of all patients will continue. The final analysis, once all patients have been followed-up for at least 6 months, is scheduled for early 2008.
Conclusion
In an era of expensive treatments and limited budgets, clinicians and healthcare policy-makers have the difficult task of assessing the evidence-base to determine which treatments are both efficacious and cost-effective. Evidence-based medicine and critical appraisal are an essential part of the assessment of treatment efficacy, as despite clinical trials being the gold standard for the comparison of many treatments in a clinical setting, not all clinical trials are of good quality. Interpreting the evidence and ensuring the assessment of treatment is performed in a reliable and un-biased manner is not straight-forward, and requires an understanding of statistics, clinical trials and meta-analysis. Clinical trial units have the experience of both clinical trial design and statistics to provide advice and support to clinicians undertaking clinical research.
- Evidence-based medicine and critical appraisal form an essential part of healthcare policy
- Finding and, more importantly, correctly interpreting the evidence is not straightforward - it requires an understanding of statistics, clinical trials and meta-analysis
- Clinical trials are the gold standard for the comparison of many treatments in a clinical setting, but the rationale behind clinical trials is often misunderstood
- Clinical trial units have the experience of both clinical trial design and statistics to provide advice and support to clinicians undertaking clinical research
References
- Sackett DL, Rosenberg WMC, Gray JAM, et al. Evidence based medicine: what it is and what it isn't. BMJ 1996;312:71-2.
- Chalmers TC, Matta RJ, Smith H, et al. Evidence favoring the use of anticoagulants in the hospital phase of acute myocardial infarction. N Engl J Med 1977;297:1091-6.
- Wheatley K. SAB - a promising new treatment to improve remission rates in AML in the elderly? Br J Haematol 2002;118:432-3.
- Carroll D, Tramer M, McQuay H, et al. Randomization is important in studies with pain outcomes: systematic review of transcutaneous electrical nerve stimulation in acute postoperative pain. Br J Anaesth 1996;77:798-803.
- Collins R, Peto R, Gray R, et al. Large-scale randomized evidence: trials and overviews. In: Oxford Textbook of Medicine (eds D. Weatherall, JGG Ledingham & DA Warrell), pp. 21-32. Oxford University Press, Oxford.
- Guyatt GH, Sackett DL, Cook DJ. Users' guides to the medical literature. II. How to use an article about therapy or prevention. A. Are the results of the study valid? Evidence-Based Medicine Working Group. JAMA 1993;270:2598-2601.
- Guyatt GH, Sackett DL, Cook DJ. Users' guides to the medical literature. II. How to use an article about therapy or prevention. B. What were the results and will they help me in caring for my patients? Evidence-Based Medicine Working Group. JAMA 1994;271:59-63.
- Schulz KF, Grimes DA. Sample size calculations in randomised trials: mandatory and mystical. Lancet 2005;365:1348-53.
- Wheatley K, Stowe RL, Clarke CE, et al. Evaluating drug treatments for Parkinson's disease: how good are the trials? BMJ 2002;324:1508-11.
- Early Breast Cancer Trialists' Collaborative Group. Tamoxifen for early breast cancer: an overview of the randomised trials. Lancet 1998;351:1451-67.
- Schulz KF, Grimes DA. Generation of allocation sequences in randomised trials: chance, not choice. Lancet 2002;359:515-9.
- Schulz KF, Grimes DA. Allocation concealment in randomised trials: defending against deciphering. Lancet 2002;359:614-8.
- Keirse MJNC. Amniotomy or oxytocin for induction of labor: Reanalysis of a randomized controlled trial. Acta Obstet Gynecol Scand 1988;67:731-5.
- Schulz KF, Chalmers I, Hayes RJ, et al. Empirical evidence of bias: Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. JAMA 1995;273:408-12.
- Ernst E, White AR. Acupuncture for back pain. A meta-analysis of randomized controlled trials. Arch Intern Med 1998;158:2235-41.
- Schulz KF, Grimes DA. Sample size slippages in randomised trials: exclusions and the lost and wayward. Lancet 2002;359:781-5.
- Gray R, Stowe RL, Hills RK et al. Non-random drop-out bias: intention to treat or intention to cheat? Control Clin Trials 2001;22:38S-39S (Abstract).
- Hills RK, Richards SM, Wheatley K. Corner cutting compromises clinical trials: the inherent problems with up-front randomisation and a common standard arm. Leuk Res 2003;27:1071-3.
- The Coronary Drug Project Research Group. Influence of adherence to treatment and response of cholesterol on mortality in the Coronary Drug Project. N Engl J Med 1980;303:1038-41.
- Begg CB, Cho MK, Eastwood S, et al. Improving the quality of reporting of randomized controlled trials: the CONSORT statement. JAMA 1996;276: 637-9.
- www.consort-statement.org (accessed June 30, 2006)
- Juni P, Altman DG, Egger M. Systematic reviews in health care: Assessing the quality of controlled clinical trials. BMJ 2001;323:42-6.
- Weibull H, Bergqvist D, Bergentz SE, et al. Percutaneous transluminal renal angioplasty versus surgical reconstruction of atherosclerotic renal artery stenosis: a prospective randomized study. J Vasc Surg 1993; 18:841-52.
- Webster J, Marshall F, Abdalla M, et al. Randomised comparison of percutaneous angioplasty vs continued medical therapy for hypertensive patients with atheromatous renal artery stenosis. Scottish and Newcastle Renal Artery Stenosis Collaborative Group. J Hum Hypertens 1998;12:329-35.
- Plouin PF, Chatellier G, Darne B, et al. Blood pressure outcome of angioplasty in atherosclerotic renal artery stenosis: a randomized trial. Essai Multicentrique Medicaments vs Angioplastie (EMMA) Study Group. Hypertension 1998;31:823-9.
- van de Ven PJ, Kaatee R, Beutler JJ, et al. Arterial stenting and balloon angioplasty in ostial atherosclerotic renovascular disease: a randomised trial. Lancet 1999; 353:282-6.
- van Jaarsveld BC, Krijnen P, Pieterman H, et al. The effect of balloon angioplasty on hypertension in atherosclerotic renal-artery stenosis. Dutch Renal Artery Stenosis Intervention Cooperative Study Group. N Engl J Med 2000;342:1007-14.
- Ives NJ, Wheatley K, Stowe RL, et al. Continuing uncertainty about the value of percutaneous revascularization in atherosclerotic renovascular disease: a meta-analysis of randomized trials. Nephrol Dial Transplant 2003;18:298-304.
- Ramsay LE, Waller PC. Blood pressure response to percutaneous transluminal angioplasty for renovascular hypertension: an overview of published series. BMJ 1990;300:569-72.
- Blum U, Hauer M, Krumme B. Percutaneous revascularization of renal artery stenosis. Balloon angioplasty vs. stent implantation. Radiologe 1999;39:135-43.
- Isles CG, Robertson S, Hill D. Management of renovascular disease: a review of renal artery stenting in ten studies. Q J Med 1999;92:159-67.
- Bloch MJ, Pickering T. Renal vascular disease: medical management, angioplasty, and stenting. Semin Nephrol 2000;20:474-88.
- Plouin PF, Rossignol P, Bobrie G. Atherosclerotic renal artery stenosis: to treat conservatively, to dilate, to stent, or to operate? J Am Soc Nephrol 2001;12:2190-6.
- Harden PN, MacLeod MJ, Rodger RSC, et al. Effect of renalartery stenting on progression of renovascular renal failure. Lancet 1997;349:1133-6.
- Taves DR. Minimization: a new method of assigning patients to treatment and control groups. Clin Pharmacol Ther 1974;15:443-53.
- Pocock SJ, Simon R. Sequential treatment assignment with balancing for prognostic factors in the controlled clinical trial. Biometrics 1975;31:103-15.
08-2006 BUY1145050/JB2199/MB001932/CMC 11th edition




Principles of pharmacovigilance
