Must original investigators get authorship in re-analyses of their shared data?

Last year in New England Journal of Medicine editors Dan Longo and Jeffrey Drazen made a brazen attack on the very idea of routine data sharing.

The editors introduced what they hoped would become a derogatory term of shame, “research parasites” who

 had nothing to do with the design and execution of the study but use another group’s data for their own ends, possibly stealing from the research productivity planned by the data gatherers, or even use the data to try to disprove what the original investigators had posited.

Instead, the editorial gave rise to researchers proudly wearing “research parasites” T-shirts to conferences (I want one!), and then an actual annual reward for best achievement of a research parasite.

An exceptionally well-written letter published in Nature: Genetics, Celebrating parasites -provides descriptions the accomplishments of past and current winners of the annual research parasite awards.

But I think the letter will be remembered for the clearly articulated description of the necessity of secondary data analyses, and the need for such analyses to be conducted and interpreted independent of the individual investigators who generate data.

Some quotable quotes

The act of rigorous secondary data analysis is critical for maintaining the accuracy and efficiency of scientific discovery. As scientists, we make predictions, perform experiments and generate data to test those predictions. When we ask rigorous questions, we obtain more accurate findings that can prevent harm. For example, Vioxx was evaluated for use in treating pain associated with rheumatoid arthritis5. Questions were raised shortly thereafter about its cardiovascular effects6. Independent researchers, using data from multiple studies, identified a drug-associated increase in cardiovascular event risk7. These research parasites identified important side-effects of this drug, correcting incomplete information on the drug’s safety profile.

Parasites also improve efficiency: many data sets were originally designed for specific questions, but these data may also answer distinct but related questions. Investigators can refocus data sets via meta-analysis to reveal general patterns that become apparent only with many studies. Data sets can also be individually useful. New researchers can often bring their own creative ideas to existing data, leading to novel breakthroughs and disruptive innovations.

Importantly, the Nature: Genetics letter explains why secondary analyses of data should not routinely involve original investigators as gatekeepers of co-authors.

Under some proposals for data reuse, data would be shared with researchers working in concert with the investigators who initially analyzed the data1. We expect that this would counteract the recent focus of the US National Institutes of Health (NIH) on rigor, transparency and reproducibility. Any procedure that includes data generators as gatekeepers has the potential to compromise rigor and robustness. As gatekeepers, researchers could withhold data from those with contrary views or a reputation of challenging the status quo. We must expect data sharing to lead to some conclusions being challenged and, ultimately, refuted. If this fails to occur, it indicates a problem with the process and not the correctness of conclusions.

The Struggle over release of the PACE trial data

As many readers know, I’m engaged in a struggle with the senior editors of PLOS One who are protecting the investigators of the PACE trial of cognitive behaviour therapy and graded exercise therapy for chronic fatigue syndrome in their continued refusal to release data they had promised would be available as a condition for publishing in PLOS One.

Release of the data is crucial because:

  1. The investigators had promised the data would be available as a condition for publishing in PLOS One.
  2. The trial was funded in large part with public funds.
  3. The investigators eventually declared financial conflicts of interest they did not disclose to the patients providing data for the trial.
  4. The investigators switched scoring of some primary subject of self-report measures after learning from colleagues conducting another trial that the primary outcomes specified in the PACE trial protocol would be unlikely to produce significant effects.
  5. The investigators suppressed objective behavioral performance and return-to-work data needed to calculate secondary recovery that would have similarly demonstrated a lack of effects of these therapies.
  6. The PLOS One article argues that cognitive behavior therapy and graded exercise therapy were cost-effective, but to be cost-effective, interventions need first to be demonstrated to be effective.
  7. The PLOS One article prominently misrepresents on the first page the investigators as having complied with data sharing policies.

The Nature: Genetics article never explicitly mentions the PACE trial. However, it marked a major breakthrough for critics of the PACE trial when Journal of Health Psychology published an open-access series of commentaries that Harriet Hall blogged about in Science-Based Medicine .

Treating Chronic Fatigue Syndrome with Cognitive Behavioral Therapy and Graded Exercise Therapy: How the PACE Trial Got It Wrong

The struggle to access the trial data

Critics pushed for an independent review of the trial data. They submitted dozens of freedom-of-information (FOI) requests for PACE-related documents and data. The National Institutes of Health (NIH) and the Institute of Medicine (IOM) chimed in, and experts said the deconditioning hypothesis was flawed and untenable. The authors claimed they were being persecuted by patients and advocacy groups. They had received death threats and a phone threat of castration, and one had been stalked by a woman who brought a knife to one of his lectures.

When the trial data were finally made available (only after long persistence, many FOI requests, and a court order), an independent group did a preliminary analysis of “recovery” from CFS using individual participant data. It found that the previously reported recovery rates had been inflated by an average of four-fold. Re-analyzing the data according to the published trial protocol revealed that the recovery rate was 3.1% for SMC alone, 6.8% for CBT, 4.4% for GET, and 1.9% for APT. These differences were not statistically significant.

Fiona Godlee wrote in an editorial in the British Medical Journal (BMJ) that when there is enough doubt to warrant independent re-analysis, “Such independent reanalysis and public access to anonymised data should anyway be the rule, not the exception, whoever funds the trial.”

The authors of the re-analysis said,

The PACE trial provides a good example of the problems that can occur when investigators are allowed to substantially deviate from the trial protocol without adequate justification or scrutiny. We therefore propose that a thorough, transparent, and independent re-analysis be conducted to provide greater clarity about the PACE trial results. Pending a comprehensive review or audit of trial data, it seems prudent that the published trial results should be treated as potentially unsound, as well as the medical texts, review articles, and public policies based on those results.

Amen. Except I have to add that Harriet Hall got two critical points wrong.

After a legal struggle in which the PACE investigators spent almost 250,000 British pounds lost an appeal and were compelled to release a very limited portion of their data. These are the data that were subject to re-analyses.

In in the Lower Tribunal hearing in which the PACE investigators lost their appeal and were compelled to release the data, one investigator had to concede that reports of death threats were exaggerated and simply false.

Neither the original primary report of outcomes for the PACE trial published in The Lancet nor the claims of cost-effectiveness published in PLOS One can be independently evaluated without release of the data promised as a condition for publishing in PLOS One.

I renew my request that the senior editor at PLOS One obtain release of the data so that I and others can analyze it.

Release is important for clinical and public policy influenced by the claimed results of the PACE trial, as well as the reputation of PLOS One as the premier open access make a journal with enforced data sharing.

And no, dammit, I am not going to let the PACE investigators tell me how to conduct my forensic exploratory analyses nor invite them to be authors. They would just mess up my efforts and produce results consistent with their financial conflicts of interest like investigators did in their own publications.

I blog at a number of different sites including Quick Thoughts, PLOS blog Mind the Brain, and occasionally Science-Based Medicine. To keep up on my writing and speaking engagements and to get advance notice of e-books and web-based courses, please sign up at

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s