Unsuccessful trial accrual and human subjects protections: An empirical analysis of recently closed trials

Ratio of actual enrolment to expected enrolment versus number of trials for trials that completed and trials that terminated due to poor accrual in 2011

Ratio of actual enrolment to expected enrolment versus number of trials for trials that completed and trials that terminated due to poor accrual in 2011

The moral acceptability of a clinical trial is rooted in the risk and benefit for patients, as well as the ability of the trial to produce generalisable and useful scientific knowledge. The ability of a clinical trial to justify its claims to producing new knowledge depends in part on its ability to recruit patients to participate—the fewer the patients, the less confident we can be in the knowledge produced. So when trials have recruitment problems, those trials also have ethical problems.

In a recently published issue of Clinical Trials, my colleagues and I investigate the prevalence of poor trial accrual, the impact of accrual problems on study validity and their ethical implications.

We used the National Library of Medicine clinical trial registry to capture all initiated phase 2 and 3 intervention clinical trials that were registered as closed in 2011. We then determined the number that had been terminated due to unsuccessful accrual and the number that had closed after less than 85% of the target number of human subjects had been enrolled.

Of 2579 eligible trials, 481 (19%) either terminated for failed accrual or completed with less than 85% expected enrolment, seriously compromising their statistical power. A total of 48,027 patients had enrolled in trials closed in 2011 who were unable to answer the primary research question meaningfully.

Not only that, but we found that many trials that should have been terminated were pursued to completion, despite flagging rates of subject accrual, and the proportion of trials that completed was much higher than the proportion of trials that terminated, even at accrual levels as low as 30%. (See attached figure.)

The take-home message is that ethics bodies, investigators, and data monitoring committees should carefully scrutinize trial design, recruitment plans, and feasibility of achieving accrual targets when designing and reviewing trials, monitor accrual once initiated, and take corrective action when accrual is lagging.

2014 Nov

The Landscape of Early Phase Research

landscape-for-web

As Jonathan is fond of saying: Drugs are poisons. It is only through an arduous process of testing and refinement that a drug is eventually transformed into a therapy. Much of this transformative work falls to the early phases of clinical testing. In early phase studies, researchers are looking to identify the optimal values for the various parameters that make up a medical intervention. These parameters are things like dose, schedule, mode of administration, co-interventions, and so on. Once these have been locked down, the “intervention ensemble” (as we call it) is ready for the second phase of testing, where its clinical utility is either confirmed or disconfirmed in randomized controlled trials.

In our piece from this latest issue of the Kennedy Institute of Ethics Journal, Jonathan and I present a novel conceptual tool for thinking about the early phases of drug testing. As suggested in the image above, we represent this process as an exploration of a 3-dimensional “ensemble space.” Each x-y point on the landscape corresponds to some combination of parameters–a particular dose and delivery site, say. The z-axis is then the risk/benefit profile of that combination. This model allows us to re-frame the goal of early phase testing as an exploration of the intervention landscape–a systematic search through the space of possible parameters, looking for peaks that have promise of clinical utility.

We then go on to show how the concept of ensemble space can also be used to analyze the comparative advantages of alternative research strategies. For example, given that the landscape is initially unknown, where should researchers begin their search? Should they jump out into the deep end, to so speak, in the hopes of hitting the peak on the first try? Or should they proceed more cautiously–methodologically working their way out from the least-risky regions, mapping the overall landscape as they go?

I won’t give away the ending here, because you should go read the article! Although readers familiar with Jonathan’s and my work can probably infer which of those options we would support. (Hint: Early phase research must be justified on the basis of knowledge-value, not direct patient-subject benefit.)

UPDATE: I’m very happy to report that this paper has been selected as the editor’s pick for the KIEJ this quarter!

2014 Jul

The Literature Isn’t Just Biased, It’s Also Late to the Party

Journal-Banner

Animal studies of drug efficacy are an important resource for designing and performing clinical trials. They provide evidence of a drug’s potential clinical utility, inform the design of trials, and establish the ethical basis for testing drugs in human. Several recent studies suggest that many preclinical investigations are withheld from publication. Such nonreporting likely reflects that private drug developers have little incentive to publish preclinical studies. However, it potentially deprives stakeholders of complete evidence for making risk/benefit judgments and frustrates the search for explanations when drugs fail to recapitulate the promise shown in animals.

In a future issue of The British Journal of Pharmacology, my co-authors and I investigate how much preclinical evidence is actually available in the published literature, and when it makes an appearance, if at all.

Although we identified a large number of preclinical studies, the vast majority was reported only after publication of the first trial. In fact, for 17% of the drugs in our sample, no efficacy studies were published before the first trial report. And when a similar analysis was performed looking at preclinical studies and clinical trials matched by disease area, the numbers were more dismal. For more than a third of indications tested in trials, we were unable to identify any published efficacy studies in models of the same indication.

There are two possible explanations for this observation, both of which have troubling implications. Research teams might not be performing efficacy studies until after trials are initiated and/or published. Though this would seem surprising and inconsistent with ethics policies, FDA regulations do not emphasize the review of animal efficacy data when approving the conduct of phase 1 trials. Another explanation is that drug developers precede trials with animal studies, but withhold them or publish them only after trials are complete. This interpretation also raises concerns, as delay of publication circumvents mechanisms—like peer review and replication—that promote systematic and valid risk/benefit assessment for trials.

The take home message is this: animal efficacy studies supporting specific trials are often published long after the trial itself is published, if at all. This represents a threat to human protections, animal ethics, and scientific integrity. We suggest that animal care committees, ethics review boards, and biomedical journals should take measures to correct these practices, such as requiring the prospective registration of preclinical studies or by creating publication incentives that are meaningful for private drug developers.

2014 Jun

Search, Bias, Flotsam and False Positives in Preclinical Research

Photo credit: RachelEllen 2006)

Photo credit: RachelEllen 2006

If you could change one thing- and only one thing- in preclinical proof of principle research to improve its clinical generalizability, what would it be? Require larger sample sizes? Randomization? Total data transparency?

In the May 2014 issue of PLoS Biology, my co-authors Uli Dirnagl and Jeff Mogil offer the following answer: clearly label preclinical studies as either “exploratory” or “confirmatory” studies.

Think of the downed jetliner Malaysia Airlines Flight 370. To find it, you need to explore vast swaths of open seas, using as few resources as possible. Such approaches are going to be very sensitive, but also prone to false positives.   Before you deploy expensive, specialized ships and underwater vehicles to locate the plane, you want to confirm that the signal identified in exploration is real.

So it is in preclinical research as well. Exploratory studies are aimed at identifying strategies that might be useful for treating disease- scanning the ocean for a few promising treatment strategies. The vast majority of preclinical studies today are exploratory in nature. They use small sample sizes, flexible designs, short study durations, surrogate measures of response, and many different techniques to demonstrate an intervention’s promise. Fast and frugal, but susceptible to bias and random variation.

Right now, the standard practice is to go right into clinical development on the basis of this exploratory information. Instead, we ought to be running confirmatory studies first. These would involve prespecified preclinical designs, large sample sizes, long durations, etc.   Such studies are more expensive, but can effectively rule out random variation and bias in declaring a drug promising.

Our argument has implications for regulatory and IRB review of early phase studies, journal publication, and funding of research. Clearly labeling studies as one or the other would put consumers of this information on notice for the error tendencies of the study. An “exploratory” label tells reviewers that the intervention is not yet ready for clinical development- but also, that reviewers ought to relax their standards, somewhat, for experimental design and transparency. “Confirmatory,” on the other hand, would signal to reviewers and others that the study is meant to directly inform clinical development decisions- and that reviewers should evaluate very carefully whether effect sizes are confounded by random variation, bias, use of an inappropriate experimental system (i.e. threats to construct validity), or idiosyncratic features of the experimental system (i.e. threats to external validity).

2014 May

The Cost of Missing Information

109403306_26c1db655c_zThe medical research enterprise produces a massive amount of information that is critical for effective medical care, public health, innovation, and sound public policy. Yet only a fraction of this information is actually captured. Thus, for example, recent studies show that 17% of healthy volunteer phase 1 trials are published in scientific journals, whereas only 43% of phase 2 to 4 trials are published. Moreover, the trials that ultimately get published are not a random sampling of all the conducted trials. Rather, investigators and journals are biased towards publishing positive results. Famously, Peter Doshi and colleagues describe how documents unearthed through cumbersome processes dramatically altered the clinical utility picture for the influenza drug Tamiflu. They also describe the stonewalling by Roche and regulators when trial reports were sought. A real travesty.

In a 2012 report in JAMA “Clinical Trial Data as a Public Good,” Rodwin and Abramson argue for greater transparency in clinical trial reporting practices and suggest mandating the disclosure of standardized Clinical Study Reports (CSR) for all clinical trials. These CSRs are the documents that drug manufacturers produce in order to meet international and national regulatory requirements, which the authors contend are less likely to contain altered data than trials published in journals. The authors argue that mandatory disclosure would promote research integrity, is more reliable than other published summaries, and would ultimately reduce biases in biomedical research. They further justify CSR disclosure by claiming that clinical trials are public goods used by many different actors and that large public subsidies go towards supporting the drug industry in the form of drug patents, public drug insurance plans, and R&D subsidization.  We can’t think of any good policy or ethical reasons why this proposal shouldn’t be implemented.  We also think academic medical centers and public funding agencies should use their muscle to demand that sponsors commit to public reporting for all studies directly and indirectly within their jurisdiction.

What is the cost of this missing information? In the STREAM journal club, we discussed various approaches to answering this question.  Perhaps such an estimate would make more salient the impact of current policies. (photo credit:  Willi Heidelbach 2006)

2014 May

In Memorium for Kathy Glass

kathy-glassI first met Kathy in August 2001 when, newly arrived in Montreal with a totally useless PhD in molecular genetics, I approached her, hat in hand, looking for a postdoctoral position in Biomedical Ethics. Actually, my hat wasn’t in hand- it was on my head- I had a week earlier accidentally carved a canyon in my scalp when I left the spacer off my electric razor. Apparently, Kathy wasn’t put off by my impertinent attire, and she hired me. That was the beginning of a beautiful mentorship and, years later, as the director of the Biomedical Ethics Unit, Kathy hired me as an Assistant Prof.

More than any one person I can think of, I owe Kathy my career. Kathy was a great teacher. She kindled in me- and others around her- a passion for research ethics, and a recognition of the way that science, method, law, and ethics constitute not separately contended arenas, but an integrated whole. After the life of her mentor- Benjy Freedman- was tragically cut short, Kathy picked up Benjy’s torch and led the Clinical Trial Research Group here at McGill. Together with her CTRG colleagues, Kathy published a series of landmark papers on such issues as the use of placebo comparators, risk and pediatric research, the (mis)conduct of duplicative trials, the testing of gene therapies- papers that belong in any respectable research ethics syllabus. I use them myself. Kathy led the CTRG- and for that matter, the BMEU- with fierce conviction and an unshakable fidelity to the weakest and most debilitated. Yet she also fostered an intellectually cosmopolitan environment, where dissenting voices were welcomed. And then softly disabused of their dissent

Kathy was also a great mentor. She supervised 24 Master’s and doctoral students- many of whom went on to successful careers as bioethicists around the world- and many of whom show great promise as they continue their studies. Kathy also supervised 6 postdocs- 5 landed good academic jobs. Not bad. But what was most inspiring about Kathy was not her ability to energize talent. To some degree, talent runs on its own batteries. Instead, it was in the way Kathy was able to get pretty good, honest work out of less talented- but earnest- students. Kathy was elite, but not an elitist.

A mutual colleague has described Kathy as self-effacing. She took pleasure in her achievements, but still greater pleasure in the achievements of her collaborators and students. Over the last few weeks, I have fielded countless queries from colleagues far and wide- highly influential bioethicists who worked with her like Carl Elliot and Leigh Turner in Minnesota, or Michael McDonald in Vancouver. And look around you in this chapel and you will see some more leading lights of bioethics and clinical trials- Charles Weijer, Bartha Knoppers, Trudo Lemmens, Stan Shapiro to name a few. Their presence and deep affection testify to Kathy’s personal and professional impact, as does the recognition accorded by the Canadian Bioethics Society when Kathy received the Lifetime Achievement Award in 2011.

Kathy was a fundamentally decent human being. She confronted an unusual amount of personal adversity- the death of her son, her early experience with cancer and its later recurrence, the untimely death of her mentor- with courage, dignity, and a resilience that inspired all around her. Her work speaks so convincingly in part because it is informed by these personal experiences.

After her retirement and when she was able to muster the strength- and navigate the ice on Peel Street- Kathy would show up at my research group meetings and participate in discussions. I speak for my group- and also my colleagues in research ethics- when I say our universe will be smaller and a little less inviting without the presence of this gentle, inquisitive, selfless, and righteous woman.

-Jonathan Kimmelman, April 17, 2014

2014 Apr

The Ethics of Unequal Allocation

unequal-allocation

In the standard model for randomized clinical trials, patients are allocated on an equal, or 1:1, basis between two treatment arms. This means that at the conclusion of patient enrollment, there should be roughly equal numbers of patients receiving the new experimental treatment as those receiving the standard treatment or placebo. This 1:1 allocation ratio is the most efficient from a statistical perspective, since it requires the fewest number of patient-subjects to achieve a given level of statistical power.

However, many recent late-phase trials of neurological interventions have randomized their participants in an unequal ratio, e.g., on a 2:1 or 3:1 basis. In the case of 2:1 allocation, this means that there are twice as many patient-subjects receiving the new (and unproven) treatment as those receiving the standard or placebo. This practice is typically justified by the assumption that it is easier to enroll patient-subjects in a trial if they believe they are more likely to receive the new/active treatment.

In an article from this month’s issue of Neurology, Jonathan and I present three arguments for why investigators and oversight boards should be wary of unequal allocation. Specifically, we argue that the practice (1) leverages patient therapeutic misconceptions; (2) potentially interacts with blinding and thereby undermines a study’s internal validity; and (3) fails to minimize overall patient burden by introducing unnecessary inefficiencies into the research enterprise. Although these reasons do not universally rule-out the practice–and indeed we acknowledge some circumstances under which unequal allocation is still desirable–they are sufficient to demand a more compelling justification for its use.

The point about inefficiency reflects a trend in Jonathan’s and my work–elucidating the consequences for research ethics when we look across a series of trials, instead of just within one protocol. So to drive this point home here, consider that the rate of successful translation in neurology is estimated at around 10%. This means that for every 10 drugs that enter the clinical pipeline, only 1 will ever be shown effective. Given the limited pool of human and material resources available for research and the fact that a 2:1 allocation ratio typically requires 12% more patients to achieve a given level of statistical power, this increased sample size and cost on a per trial basis may mean that we use up our testing resources before we ever find that 1 effective drug.

2014 Jan

Pharmacodynamic Studies in Drug Development: What you don’t know can hurt you

Research biopsies involve the collection of tissues for scientific endpoints. In the early phase cancer trials, research biopsies are often used to assess the biological activity of a drug on a molecular level. This is called pharmacodynamics – the study of what a drug does to the body at the molecular or cellular level. Because these research procedures can burden patients but have no value for their disease management, the ethical justification for research biopsies rides on the benefit to future patients through the development of safe and effective drugs.

Working under the premise that accrual of such “knowledge value” requires the publication of findings, we previously reported that only a third of pharmacodynamic studies are published in full. This initial report, published in Clinical Cancer Research, also found that over 60% of participating research oncologists regard reporting quality of pharmacodynamic data to be fair to poor. The August issue of the British Journal of Cancer  reports on the findings of our recent follow up of that study.

In it, we find that reporting quality varies widely between studies.  Many studies do not report methodologies- or results- that would enable readers of such studies to make reliable inferences about the pharmacodynamics findings.  For instance, only 43% of studies reported use of blinding, 38% reported dimensions of tissues used for biopsy analysis, and 62% reported flow of patients through analysis. We also found a preponderance of “positive results,” suggesting possible publication bias.

Together, our two investigations offer a complex picture of publication and reporting quality for PD studies involving nondiagnostic biopsy. Frequent nonpublication and low reporting of basic methodological items suggests room for improvement. However, we did uncover many studies that were well reported, and a large fraction of studies (72%) described ways that PD findings might guide future investigations. In the end, the results of our studies highlight many opportunities for researchers to improve the risk/benefit balance of PD studies by improving the way they are reported.

2013 Sep

Uncaging Validity in Preclinical Research

Knockout_Mice5006-300

High attrition rates in drug development bedevil drug developers, ethicists, health care professionals, and patients alike.  Increasingly, many commentators are suggesting the attrition problem partly relates to prevalent methodological flaws in the conduct and reporting of preclinical studies.

Preclinical efficacy studies involve administering a putative drug to animals (usually mice or rats) that model the disease experienced by humans.  The outcome sought in these laboratory experiments is efficacy, making them analogous to Phase 2 or 3 clinical trials.

However, that’s where the similarities end.  Unlike trials, preclinical efficacy studies employ a limited repertoire of methodological practices aimed at reducing threats to clinical generalization.  These quality-control measures, including randomization, blinding and the performance of a power calculation, are standard in the clinical realm.

This mismatch in scientific rigor hasn’t gone unnoticed, and numerous commentators have urged better design and reporting of preclinical studies.   With this in mind, the STREAM research group sought to systematize current initiatives aimed at improving the conduct of preclinical studies.  The results of this effort are reported in the July issue of PLoS Medicine.

In brief, we identified 26 guideline documents, extracted their recommendations, and classified each according to the particular validity type – internal, construct, or external – that the recommendation was aimed at addressing.   We also identified practices that were most commonly recommended, and used these to create a STREAM checklist for designing and reviewing preclinical studies.

We found that guidelines mainly focused on practices aimed at shoring up internal validity and, to a lesser extent, construct validity.  Relatively few guidelines addressed threats to external validity.  Additionally, we noted a preponderance of guidance on preclinical neurological and cerebrovascular research; oddly, none addressed cancer drug development, an area with perhaps the highest rate of attrition.

So what’s next?  We believe the consensus recommendations identified in our review provide a starting point for developing preclinical guidelines in realms like cancer drug development.  We also think our paper identifies some gaps in the guidance literature – for example, a relative paucity of guidelines on the conduct of preclinical systematic reviews.  Finally, we suggest our checklist may be helpful for investigators, IRB members, and funding bodies charged with designing, executing, and evaluating preclinical evidence.

Commentaries and lay accounts of our findings can be found in PLoS Medicine, CBC News, McGill Newsroom and Genetic Engineering & Biotechnology News.

2013 Aug

No trial stands alone

“The result of this trial speaks for itself!”

This often heard phrase contains a troubling assumption: That an experiment can stand entirely on in its own. That it can be interpreted without reference to other trials and other results. In a couple of articles published over the last two weeks, my co-authors and I deliver a one-two punch to this idea.

The first punch is thrown at the US FDA’s use of “assay sensitivity,” a concept defined as a clinical trial’s “ability to distinguish between an effective and an ineffective treatment.” This concept is intuitively appealing, since all it seems to say is that a trial should be well-designed. A well-designed clinical trial should be able to answer its question and distinguish an effective from an ineffective treatment. However, assay sensitivity has been interpreted to mean that placebo controls are “more scientific” than active controls. This is because superiority to placebo seems to guarantee that the experimental agent is effective, whereas superiority or equivalence to an active control does not rule out the possibility that both agents are actually ineffective.  This makes placebo-controlled trials more “self-contained,” easier to interpret, and therefore, methodologically superior.

In a piece in Perspectives in Biology and Medicine, Charles Weijer and I dismantle the above argument by showing, first, that all experiments rely on some kinds of “external information”–be it information about an active control’s effects, pre-clinical data, the methodological validity of various procedures, etc. Second, that a placebo can suffer from all of the same woes that might afflict an active control (e.g., the “placebo effect” is not one, consistent effect, but can vary depending upon the type or color of placebo used), so there is no guarantee of assay sensitivity in a placebo-controlled trial. And finally, the more a trial’s results can be placed into context, and interpreted in light of other trials, the more potentially informative it is.

This leads to punch #2: How should we think about trials in context? In a piece in Trials, Charles Heilig, Charles Weijer, and I present the “Accumulated Evidence and Research Organization (AERO) Model,” a graph-theoretic approach to representing the sequence of experiments and clinical trials that constitute a translational research program. The basic idea is to illustrate each trial in the context of its research trajectory using a network graph (or a directed acyclic graph, if you want to get technical), with color-coded nodes representing studies and their outcomes; and arrows representing the intellectual lineage between studies. This work is open-access, so I won’t say too much more about it here, but instead encourage you to go and give it a look. We provide a lot of illustrations to introduce the graphing algorithm, and then apply the approach to a case-study involving inconsistent results across a series of tuberculosis trials.

In sum: Trials should not be thought of as self-contained. This is not even desirable! Rather, all trials (or at least trials in translational medicine) should be thought of as nodes in a complex, knowledge producing network. Each one adding something to our understanding. But none ever truly “speaks for itself,” because none should ever stand alone.

2013 Jun

Page 1 of 1512345...10...Last »

Search STREAM

Old blog posts

Our mission

STREAM Group applies empirical and philosophical tools for addressing scientific, ethical, and policy challenges in the development and translation of health technologies.

Continue reading ...

Who we are

The STREAM Group is a collaboration of researchers who share a common set of principles about the goals and methods for studying clinical translation. Our members work in ethics, epidemiology, biology, psychology, and various medical specialties. The network is centered at McGill University, and has affiliates throughout North America.

Continue reading ...