Unfunny 2017 BMJ Christmas articles not in the same league with my all-time favorite

Which is: A systematic review of parachute use to prevent death and major trauma

I agree with Sharen Begley’s assessment in Stat of this year’s BMJ Christmas issue as a loser.

tenorA BMJ Christmas issue filled with wine glasses, sex, and back pain brings out the Grinch in us

Bah! … humbug. Is it just us, or is the highly anticipated Christmas issue of the BMJ (formerly the British Medical Journal) delivering more lumps of coal and fewer tinselly baubles lately?

Maybe it’s Noel nostalgia, but we find ourselves reminiscing about BMJ offerings from Yuletides past, which brought us studies reporting that 0.5 percent of U.S. births are to (self-reported) virgins, determining how long a box of chocolates lasts on a hospital ward, or investigating Nintendo injuries.

I agree with her stern:

Note to BMJ editors: Fatal motorcycle crashes, old people falling, and joint pain — three of this year’s Christmas issue studies — do not qualify as “lighthearted.”

My all-time favorite BMJ Christmas article

Parachute use to prevent death and major trauma related to gravitational challenge: systematic review of randomised controlled trials


With its conclusions:

As with many interventions intended to prevent ill health, the effectiveness of parachutes has not been subjected to rigorous evaluation by using randomised controlled trials. Advocates of evidence based medicine have criticised the adoption of interventions evaluated by using only observational data. We think that everyone might benefit if the most radical protagonists of evidence based medicine organised and participated in a double blind, randomised, placebo controlled, crossover trial of the parachute.

The brief article is actually a great way to start a serious discussion of randomized trials with a laugh.

Parachutist-splashWe can all agree that we wouldn’t participate in a randomized trial of parachutes. Any effort to conduct a systematic review and meta-analysis of such studies would answer up , formally speaking, as a failed meta-analysis. We we could start with a rigorous systematic search, but still end up with no studies to provide effect sizes. That’s not a bad thing, especially one as an alternative to making recommendations on weak or nonexistent data.

I would like to see a lot more formally declared failed meta-analyses by the Cochrane collaboration.  Clearly labeled failed meta-analyses much preferable to recommendations consumers and policymakers for treatments based on a small collection of methodologically weak and underpowered trials. It happens just too much.

If the discussion group were ripe for it, we could delve into when randomized trials are not needed or what to do when there aren’t randomized trials. I think one or more N = 1 trials not using a parachute or similar device would be compelling, without a nonspecific control group. On the other hand, many interventions that have been justified only by observational trials turn out not to be effective when RCT is finally done.

Keep a discussion going long enough of when RCTs can’t provide suitable evidence, and you end up in a predictable place. Someone will offer a critique of RCTs as the gold standard for evaluating interventions or maybe of systematic reviews meta-analyses of RCTs being a platinum standard. That can be very fruitful too, but sooner or later can get someone proposing alternatives to the RCT because their pet interventions don’t measure up in RCTs. Ah, yes, RCTs can capture the the magic going on long-term psychodynamic psychotherapy.

[For review of alternatives to RCTs that I co-authored, see Research to improve the quality of care for depression: alternatives to the simple randomized clinical trial ]

Ghosts of Christmases Past

Searching for past BMJ Christmas articles can be tedious. If someone can suggest an efficient search term, let me know. Fortunately the BMJ last year offered a review of all-time highlights

Christmas crackers: highlights from past years of The BMJ’s seasonal issue

BMJ 2016; 355 doi: https://doi.org/10.1136/bmj.i6679 (Published 15 December 2016)

Cite this as: BMJ 2016;355:i6679

For more than 30 years the festive issue of the journal has answered quirky research questions, waxed philosophical, and given us a good dose of humour and entertainment along the way.

A recent count found more than 1000 articles in The BMJ’s Christmas back catalogue. A look through these shows some common themes returning year after year. Professional concerns crop up often, and we seem to be endlessly fascinated by the differences between medical specialties. Past studies have looked at how specialties vary by the cars they drive,7 their ability to predict the future,8 and their coffee buying habits.9 Sometimes the research findings can challenge popular stereotypes. How many people, “orthopods” included, could have predicted that anaesthetists, with their regular diet of Sudoku and crosswords, would fare worse than orthopaedic surgeons in an intelligence test?1

It notes some of the recurring themes.

Beyond medical and academic matters, enduring Christmas themes also reflect the universal big issues that preoccupy us all: food, drink, religion, death, love, and sex.

This broad theme encompasses one of the most widely accessed BMJ Christmas articles of all times

In 2014 Ben Lendrem and colleagues explored differences between the sexes in idiotic risk taking behaviour, by studying past winners of the Darwin Awards.4 As the paper describes: winners of these awards must die in such an idiotic manner that “their action ensures the long-term survival of the species, by selectively allowing one less idiot to survive.”

There is also an interesting table of the four BMJ Christmas papers that won Ig Noble prizes:

Christmas BMJ papers awarded the Ig Nobel prize

  • Effect of ale, garlic, and soured cream on the appetite of leeches (winner 1994)15

  • Magnetic resonance imaging of male and female genitals during coitus and female sexual arousal (1999)13

  • Sword swallowing and its side effects (2006)16

  • Pain over speed bumps in diagnosis of acute appendicitis (2012)17




A meta analysis of interventions training non-mental health professionals to deal with mental health problems

 quick takesWhen garbage in isn’t necessarily garbage out.

I often see statements that meta-analyses can be no better than the literature on which they draw. The point is often underscored with something like “garbage in, garbage out” (GI,GO).

I certainly agree that we should not contact synthetic meta-analyses, aimed at isolating a single effect size when the available literature consists of small, similarly flawed studies. Yet, that is what is done entirely too often.

Many Cochrane Collaboration reviews depend on only a small handful of randomized controlled trials. The reviews typically acknowledge the high risk of bias, but policymakers, clinicians, and the advocates for the treatments that are reviewed seize on the encouraging effect sizes, ignoring  the limited quantity and quality of evidence.

In part that’s a failure of Cochrane, but in part it also reflects how hungry consumers are confident reviews of the literature, when such reassurance is just not possible.

John TukeyI think a meta-analysis of the literature characterized by mediocre studies can be valuable if it doesn’t attempt to provide an overall effect size, but only to identify the ways in which the current literature is bad and how it should be corrected. These are analytic or diagnostic meta-analysis, with the diagnostic assessment being applied to the adequacy of the existing literature and how it can be improved.

That’s why I think a recent review of anti-stigma and other training mental health training programs for non-mental health professionals is so valuable.

Booth A, Scantlebury A, Hughes-Morley A, Mitchell N, Wright K, Scott W, McDaid C. Mental health training programmes for non-mental health trained professionals coming into contact with people with mental ill health: a systematic review of effectiveness. BMC Psychiatry. 2017 May 25;17(1):196.

The review is explicit about the low quality of the literature, pointing out that most studies don’t even evaluate the key question of whether people with mental health problems come in contact with professionals benefit from what are often expensive intervention programs.

The review also points out that many studies use idiosyncratic and inadequately validated outcomes measures to assess whether interventions work. That is inexcusable because the existing literature is readily available to those desiring such studies, along with validated measures of effects on the target population.

At best, these intervention programs only have short-term benefits for the attitudes of the professionals receiving them, with little assessment of long-term benefits or of impact on the target population. This is hardly a recommendation from for large-scale programs without better evidence that they work.

Agencies funding expensive intervention programs often require evaluation components. Too bad that those conducting such programs don’t fulfill their responsibility in providing an adequate demonstration that what they are being paid for to provide actually works.

We should be quite skeptical of the claims that are made for anti-stigma and other educational programs targeting non-mental health professionals. The burden of proof is on those who market such programs, and the conflict of interest in making extravagant claims should be recognized.

We should get real about unrealistic assumptions behind such programs. Namely, the programs are predicated on the assumption that we can select a few professionals, expose them to brief interventions with unvalidated content, and expect the professionals to react expertly when suddenly thrown in situations involving persons with mental health problems. The intervention programs are typically too weak and unfocused. The programs don’t prepare professionals very well to respond effectively to unexpected, but fortunately infrequent encounters in which how they perform is so critically important.

I was once asked to apply for NIMH grant that would prepare primary care physicians to respond more effectively to older patients who were suicidal, but not expressing their intent directly. I declined to submit an application after I calculated that physicians would encounter such a situation only about once every 18 months. It would take a huge randomized trial to demonstrate any effect. But NIMH nonetheless funded a trial doomed to being uninformative from before it even enrolled the first physician.

What a systematic search yielded and what could be concluded

From 8578 search results, 19 studies met the inclusion criteria: one systematic review, 12 RCTs, three prospective non-RCTs, and three non-comparative studies.

The training interventions identified included broad mental health awareness training and packages addressing a variety of specific mental health issues or conditions. Trainees included police officers, teachers and other public sector workers.

Some short term positive changes in behaviour were identified for trainees, but for the people the trainees came into contact with there was little or no evidence of benefit.


A variety of training programmes exist for non-mental health professionals who come into contact with people who have mental health issues. There may be some short term change in behaviour for the trainees, but longer term follow up is needed. Research evaluating training for UK police officers is needed in which a number of methodological issues need to be addressed.

The studies included in the systematic review were all conducted in the USA. Eight of the 19 primary studies included took place in the USA, three in Sweden, three in England, two in Australia, and one each in Canada, Scotland and Northern Ireland. Participants included teachers, public health professionals, university resident advisors, community practitioners, public sector staff, and case workers. Law enforcement participants were trainee, probationary, university campus, and front line police officers.

The review noted that there isn’t an excuse for poor assessment of outcomes in these programs:

A recent systematic review of the measurement properties of tools measuring mental health knowledge recommends using tools with an evidence base which reach the threshold for positive ratings according to the COSMIN checklist [42].

How APA’s rating of acceptance and commitment therapy for psychosis got downgraded from “strong” to “modest” efficacy

dodosA  few years ago my blog post caused a downgrading of ACT for psychosis that stuck. This shows the meaninglessness of APA ratings of psychotherapies as evidence-supported.

Steve Hayes came into my twitter feed urging me to take a fresh look at the evidence for the efficacy of acceptance and commitment therapy (ACT).

I clicked on the link he provided and I was underwhelmed.

I was particularly struck by the ratings of ACT by the American Psychological Association Division 12.

I also noticed that ACT for psychosis was still rated only modestly supported.

A few years ago ACT was rated “strongly supported.” This rating was immediately downgraded to “modestly supported “by my exposing a single study as being p-hacked in a series of blog posts and in discussions on Facebook.

That incident sheds light on the invalidity of ratings by the American Psychological Association Division 12 of the evidence-supported status of therapies.

Steve Hayes’ Tweet

edited steve hayes exchange

Clicking on the link Hayes provided took me to

State of the ACT Evidence



The APA ratings were prominently displayed above a continuously updated list of reviews and studies.

American Psychological Association, Society of Clinical Psychology (Div. 12), Research Supported Psychological Treatments:

Chronic Pain – Strong Research Support
Depression – Modest Research Support
Mixed anxiety – Modest Research Support
Obsessive-Compulsive Disorder – Modest Research Support
Psychosis – Modest Research Support
For more information on what the “modest” and “strong” labels mean, click here

Only ACT for Chronic Pain was rated as having strong support. But that rating seemed to be contradicted by the newest systematic review that was listed:

Simpson PA, Mars T, Esteves JE. A systematic review of randomised controlled trials using Acceptance and commitment therapy as an intervention in the management of non-malignant, chronic pain in adults. International Journal of Osteopathic Medicine. 2017 Jun 30;24:18-31.

That review was unable to provide a meta analysis because of the poor quality of the 10 studies that were available and their heterogeneity.

My previous complaints about how the evidence for treatments as evaluated by APA

There are low thresholds for professional groups such as the American Psychological Association Division 12 or governmental organizations such as the US Substance Abuse and Mental Health Services Administration (SAMHSA) declaring treatments to be “evidence-supported.” Seldom are any treatments deemed ineffective or harmful by these groups.

Professional groups have conflicts of interest in wanting their members to be able to claim the treatments they practice are evidence-supported, while not wanting to restrict practitioner choice with labels of treatment as ineffective. Other sources of evaluation like SAMHSA depend heavily and uncritically on what promoters of particular psychotherapies submit in applications for “evidence supported status.”

My account of how my blogging precipitated a downgrading of ACT for psychosis

Now you see it, now, you don’t: “Strong evidence” for the efficacy of acceptance and commitment therapy for psychosis

On September 3, 2012 the APA Division 12 website announced a rating of “strong evidence” for the efficacy of acceptance and commitment therapy for psychosis. I was quite skeptical. I posted links on Facebook and Twitter to a series of blog posts (1, 23) in which I had previously debunked the study claiming to demonstrate that a few sessions of ACT significantly reduced rehospitalization of psychotic patients.

David Klonsky, a friend on FB who maintains the Division 12 treatment website quickly contacted me and indicated that he would reevaluate the listing after reading my blog posts and that he had already contacted the section editor to get her evaluation. Within a day, the labeling was changed to “designation under re-review as of 9/3/12”and it is now (10/16/12) “modest research support.”

My exposure of a small, but classic study of ACT for psychosis having been p-hacked

The initial designation of ACT as having “strong evidence” for psychosis was mainly based on a single, well promoted study, claims for which made it all the way to Time magazine when it was first published.

Bach, P., & Hayes, S.C. (2002). The use of acceptance and commitment therapy to prevent the rehospitalization of psychotic patients: A randomized controlled trial. Journal of Consulting and Clinical Psychology, 70, 1129-1139.

Of course, the designation of strong evidence requires support of two randomized trials, but the second trial was a modest attempt at replication of this study and was explicitly labeled as a pilot study.

The Bach and Hayes article has been cited 175 times as of 10/21/12  according to ISI Web of Science, mainly  for claims that appear in its abstract: patients receiving up to four sessions of an ACT intervention had “a rate of rehospitalization half that of TAU [treatment as usual] participants over a four-month follow-up [italics added].” This would truly be a powerful intervention, if these claims are true. And my check of the literature suggests that these claims are almost universally accepted. I’ve never seen any skepticism expressed in peer reviewed journals about the extraordinary claim of cutting rehospitalization in half.

  • It is not clear that rehospitalization was originally set as the primary outcome, and so there is a possible issue of a shifting primary outcome, a common tactic in repackaging a null trial as positive. Many biomedical journals require that investigators publish their protocols with a designated primary outcome before they enter the first patient into a trial. That is a strictly enforced requirement  for later publication of the results of the trial. But that is not yet usually done for RCTs testing psychotherapies.The article is based on a dissertation. I retrieved a copy andI found that  the title of it seemed to suggest that symptoms, not rehospitalization, were the primary outcome: Acceptance and Commitment Therapy in the Treatment of Symptoms of Psychosis.
  • Although 40 patients were assigned to each group, analyses only involved 35 per group. The investigators simply dropped patients from the analyses with negative outcomes that are arguably at least equivalent to rehospitalization in their seriousness: committing suicide or going to jail. Think about it, what should we make of a therapy that prevented rehospitalization but led to jailing and suicides of mental patients? This is not only a departure from intention to treat analyses, but the loss of patients is nonrandom and potentially quite relevant to the evaluation of the trial. Exclusion of these patients have substantial impact on the interpretation of results: the 5 patients missing from the ACT group represented 71% of the reported rehospitalizations  and the 5 patients missing from the TAU group represent 36% of the reported rehospitalizations in that group.
  • Rehospitalization is not a typical primary outcome for a psychotherapy study. But If we suspend judgment for a moment as to whether it was the primary outcome for this study, ignore the lack of intent to treat analyses, and accept 35 patients per group, there is still not a simple, significant difference between groups for rehospitalization. The claim of “half” is based on voodoo statistics.
  • The trial did assess the frequency of psychotic symptoms, an outcome that is closer to what one would rely to compare to this trial with the results of other interventions. Yet oddly, patients receiving the ACT intervention actually reported more, twice the frequency of symptoms compared to patients in TAU. The study also assessed how distressing hallucinations or delusions were to patients, what would be considered a patient oriented outcome, but there were no differences on this variable. One would think that these outcomes would be very important to clinical and policy decision-making and these results are not encouraging.

Another study, which has been cited 64 times [at the time] according to ISI Web of Science, rounded out the pair needed for a designation of strong support:

Gaudiano, B.A., & Herbert, J.D. (2006). Acute treatment of inpatients with psychotic symptoms using acceptance and commitment therapy: Pilot results. Behaviour Research and Therapy, 44, 415-437.

Appropriately framed as a pilot study, this study started with 40 patients and only delivered three sessions of ACT. The comparison condition was enhanced treatment as usual consisting of psychopharmacology, case management, and psychotherapy, as well as milieu therapy. Follow-up data were available for all but 2 patients. But this study is hardly the basis for rounding out a judgment of ACT as efficacious for psychosis.

  • There were assessments with multiple conventional psychotic symptom and functioning measures, as well as ACT-specific measures. The only conventional measure to achieve significance was distress related to hallucinations and there were no differences in ACT specific measures. There were no significant differences in rehospitalization.

  • The abstract puts a positive spin on these findings: “At discharge from the hospital, results suggest that short-term advantages in effect of symptoms, overall improvement, social impairment, and distress associated with hallucinations. In addition, more participants in the ACT condition reach clinically significant symptom improvement at discharge. Although four-month rehospitalization rates were lower in the ACT group, these differences did not reach statistical significance.”

I noted at the time:

The provisional designation of ACT as having strong evidence of efficacy for psychosis could have had important consequences. Clinicians and policymakers could decide that merely providing three sessions of ACT is a sufficient and empirically validated approach to keep chronic mental patients from returning to the hospital and maybe even make discharge decisions based on whether patients had received ACT. But the evidence just isn’t there that ACT prevents rehospitalization, and when the claim is evaluated against what is known about the efficacy of psychotherapy for psychotics, it appears to be an unreasonable claim bordering on the absurd.



Conflicts of interest in Cochrane reports on psychological interventions


Recently I was honored to join an esteemed group of international colleagues in writing to the Cochrane about the Collaboration’s inattention to conflicts of interest in reviews of psychotherapeutic interventions.

The Collaboration has been particularly lax in dealing with conflicts of interest with respect to psychological interventions for “chronic fatigue syndrome” and medically unexplained symptoms. The group obviously don’t apply the same standards that they would to industry’s involvement in evaluations of medical interventions. Why should psychotherapy be different?

In 2017 I will push the Cochrane to do a better job of protecting their valuable reviews from the taint of conflicts of interest, both declared and undeclared.

For background, please see my previous posts –

An open letter to the Cochrane Collaboration: Bill Silverman lies a-moldering in his grave

Why the Cochrane Collaboration needs to clean up conflicts of interest

 I elicited a reply from David Tovey, the Editor in Chief of the Cochrane Library:

To which I responded:

My response to an invitation to improve the Cochrane Collaboration by challenging its policies

The letter with an international group of psychotherapy researchers and meta-analysts

Conflicts of interest in Cochrane reports on psychological interventions

Winfried Rief  (GER), Gerhard Andersson A(SWE), Juergen Barth (SWI), James Coyne (US), Pim Cuiipers (NL), Stefan G. Hofmann  (USA), Klaus Lieb (GER)

Conflicts of interests are a major threat to the validity of clinical trials, meta-analyses and Cochrane reports. Accordingly, people with close links to pharmaceutical companies such as Novartis are typically not invited to chair a Cochrane review on methylphenidate in ADHD, members of Pfizer are not chairing Cochrane reports on sildenafil, etc. However, Cochrane analyses on psychological interventions allow strong conflicts of interests of the chairing experts. Conflicts of interests in psychotherapy might be responsible for controversial and heated debates beyond scientific evidence, and financial involvements and interests can be also substantial 1.

The influence of personal preferences in original clinical trials is strong and a robust finding. This well-known influence is often called the allegiance effect. If experts are highly identified with a specific treatment approach, their scientific reports notoriously overestimate the effect sizes of this treatment. A recent analysis showed a robust and moderate allegiance effect on outcome reports (Cohen’s d=.54) 2. If meta-analyses aggregate these biased original study reports, again mainly steered by the same scientists who are over-identified with this approach, the bias is further amplified in the corresponding Cochrane analysis.

Therefore, we discourage that authors with a strong allegiance for one therapeutic intervention analyze and summarize their favorite approach in Cochrane reports. For example, the person who co-developed cognitive therapy (Aaron T. Beck) should not write the Cochrane analysis on cognitive therapy; Steven Hayes should not author a Cochrane analysis on Acceptance and Commitment Therapy, which was primarily  developed by himself, and for which he expressed a serious interest that this should be broadly disseminated; Gerhard Andersson should not review internet interventions, after a major part of published trials in this field originate from his group; Peter Fonagy and Falk Leichsenring should not chair Cochrane reviews on psychodynamic psychotherapies, after they published a series of papers all expressing a strong interest that these types of therapies should be better acknowledged.

Two conclusions can be drawn. First, the most ambitious proponents for specific treatments should not have a major influence in Cochrane reports on this intervention. Existing Cochrane reports fulfilling this criterion should be excluded from the Cochrane databases. Second, conflicts of interests of any expert who contributes to Cochrane analyses of psychological interventions should be assessed by the Cochrane group and declared by the authors. Criteria for this type of conflict of interest that should be reported could be: the author developed one of the treatments that is examined in the meta-analysis; the author wrote a treatment manual that is examined in the meta-analysis; the author gave workshops or keynote lectures on one of the treatments or is leading a respective psychotherapy institute; the author published comments in favor of one of the treatments or recommended one of the treatments over another; original studies of the author are included in the Cochrane report.

As far as we understood the Cochrane initiative, it is supposed to provide robust and critical information to the public and to health care providers. However, this can only be achieved if no obvious conflicts of interest of the authors are evident, or if conflicts of interest are balanced between proponents and more critical participants. While the Cochrane initiative started already attempts to control for allegiance effects, these effects need to be controlled more rigorously. All authors should declare potential conflicts of interest for reasons of transparency, while experts with strong allegiance to one treatment should not be included in Cochrane reports about this treatment at all.


  1. Lieb K, von der Osten-Sacken J, Stoffers-Winterling J, Reiss N, Barth J. Conflicts of interest and spin in reviews of psychological therapies: a systematic review. BMJ Open 2016; 26(6 (4)).
  2. Munder T, Brütsch O, Leonhart R, Gerger H, Barth J. Researcher allegiance in psychotherapy outcome research: An overview of reviews. Clinical Psychology Review 2013; 33: 501-11.

ebook_mindfulness_345x550I will soon be offering scientific writing courses on the web as I have been doing face-to-face for almost a decade. Sign up at my new website to get notified about these courses, as well as upcoming blog posts at this and other blog sites.  Get advance notice of forthcoming e-books and web courses. Lots to see at CoyneoftheRealm.com.

Interventions to reduce stress in university students: Anatomy of a bad meta-analysis

Regehr C, Glancy D, Pitts A. Interventions to reduce stress in university students: A review and meta-analysis. Journal of Affective Disorders. 2013 May 15;148(1):1-1.

I saw a link to this meta-analysis on Twitter. I decided to take a look.

oh-myThe experience added to my sense that many people who tweet, retweet or “like” tweets about studies have either not read the studies or lack of basic understanding of research or both.

What I can redeem for my experience is a commentary is another contribution to screen and quickly dismiss bad meta-analyses.

I have written at length about screening out bad meta-analyses in Hazards of pointing out bad meta-analyses of psychological interventions.

In that blog post, I provide an excellent rationale for complaining about bad meta-analyses from Hilda Bastian. If you click on her name below, you can access an excellent blog post about bad meta-analyses from Hilda as well.

Psychology has a meta-analysis problem. And that’s contributing to its reproducibility problem. Meta-analyses are wallpapering over many research weaknesses, instead of being used to systematically pinpoint them. – Hilda Bastian

Unfortunately, this meta-analysis is behind a pay wall. If you have accessed through a University library, does not pose a problem, only the inconvenience of having to log into your University library website. If you are motivated to do so, you could request a PDF with an email one of the authors at Cheryl.regehr@utoronto.ca.

I don’t think you need to go to the trouble of writing the authors to benefit from my brief analysis. Particularly because you can see the start of the problems in accessing the abstract here.

And here’s the abstract:



Recent research has revealed concerning rates of anxiety and depression among university students. Nevertheless, only a small percentage of these students receive treatment from university health services. Universities are thus challenged with instituting preventative programs that address student stress and reduce resultant anxiety and depression.


A systematic review of the literature and meta-analysis was conducted to examine the effectiveness of interventions aimed at reducing stress in university students. Studies were eligible for inclusion if the assignment of study participants to experimental or control groups was by random allocation or parallel cohort design.


Retrieved studies represented a variety of intervention approaches with students in a broad range of programs and disciplines. Twenty-four studies, involving 1431 students were included in the meta-analysis. Cognitive, behavioral and mindfulness interventions were associated with decreased symptoms of anxiety. Secondary outcomes included lower levels of depression and cortisol.


Included studies were limited to those published in peer reviewed journals. These studies over-represent interventions with female students in Western countries. Studies on some types of interventions such as psycho-educational and arts based interventions did not have sufficient data for inclusion in the meta-analysis.


This review provides evidence that cognitive, behavioral, and mindfulness interventions are effective in reducing stress in university students. Universities are encouraged to make such programs widely available to students. In addition however, future work should focus on developing stress reduction programs that attract male students and address their needs.

I immediately saw that this was a bad abstract because it was so uninformative. There are so many abstracts of meta-analyses freely available on the web. We need to be given the information to recognize when we are confronting an abstract of a bad meta-analysis, so that we can move on. I feel strongly that authors have responsibility to make their abstracts informative. If they don’t in their initial manuscripts, editors and reviewers should insist on improving the abstracts as a condition for publication.

This abstract is faulty because it does not give the effect sizes to back up its claims about the effectiveness of interventions to reduce stress in university students. It also does not comment in any way on the methodological quality of 24 studies that were included. To the unwary reader, it makes the policy recommendation of making stress reduction programs available to students and maybe tailoring such programs so they will attract males.

The authors of abstracts making such recommendations have a responsibility to give some minimal details of the quality of the evidence behind the recommendation. These authors do not.

When I accessed the article through my University library, I immediately encountered intheOpening of the introduction:

 On September 5, 2012, a Canadian national news magazine ran a cover story entitled “Mental Health Crisis on Campus: Canadian students feel hopeless, depressed, even suicidal” (1 ). The story highlighted a 2011 survey at University of Alberta in which over 50% of 1600 students reported feeling hopeless and overwhelming anxiety over the past 12 months. The story continued by recounting incidents of suicide across Canadian campuses. The following month, the CBC reported a survey conducted at another Canadian university indicating that 88.8% of the students identified feeling generally overwhelmed, 50.2% stated that they were overwhelmed with anxiety, 66.1% indicated they were very sad, and 34.2% reported feeling depressed (2 ).

These are startling claims and they require evidence. Unfortunately the only evidence that is provided is to secondary news sources.

Authors making such strong claims in a peer-reviewed article have responsibility provide appropriate documentation. In this particular case, I don’t believe that such extreme statements even belong in a supposedly scholarly peer-reviewed article.

A section headed Data Analysis seem to provide encouragement that the authors knew what they were doing.

 A meta-analysis was conducted to pool change in the primary outcome (self-reported anxiety) and secondary outcomes (self-reported depression and salivary cortisol level) from baseline to the post-intervention period using Comprehensive Meta-analysis software, version 2.0. All data were continuous and analyzed by measuring the standard mean difference between the treatment and comparison groups based on the reported means and standard deviations for each group. Standard mean differences (SMD) allowed for comparisons to be made across studies when scales measured the same outcomes using different standardized instruments, such as administering the STAI or the PSS to measure anxiety. Standard mean differences were determined by calculating the Hedges’ g ( ). The Hedges’ g is preferable to Cohen’s d in this instance, as it includes an adjustment for small sample bias. To pool SMDs, inverse variance methods were used to weigh each effect size by the inverse of its variance to obtain an overall estimate of effect size. Standard differences in means (SDMs) point estimates and 95% confidence intervals (CIs) were computed using a random effects model. Heterogeneity between studies was calculated using I 2 ( ). This statistic provides an estimate of the percentage of variability in results across studies that are likely due to treatment effect rather than chance ( ).

Unfortunately, anyone can download for a free trial the comprehensive meta-analysis software and a newer version 3.0 and get the manual with it []. The software is easy to use, perhaps too easy. One can use it to write a paper without really knowing much about conducting and interpreting a meta-analysis. You could put garbage into it, and the software would not register a protest.

The free manual provides text that could be paraphrased without knowing too much about meta-analysis.

When I’m evaluating a meta-analysis, I quickly go to the table of studies that were included. In the case of this meta-analysis, I immediately saw a problem in the description of the first study:


The sample size of 12 students assigned to the intervention and 7 to the control group was much too small to be taken seriously. Any effect size would be unreliable, and could change drastically with adding or subtracting one study participant. Because the study had been published, it undoubtedly claimed a positive effect, that would undoubtedly represent something dodgy haven’t been done. Moreover, if you have seven participants in the control group, and you get significant results, the effect size will be quite large, because it takes a large effect to get statistical significance with only seven participants in the control group.

Reviewing the rest of the table, I can see that the bulk of the 24 studies that were included were similarly small, with only a few being my usual requirement to be taken seriously of at least 35 participants in the smaller of the intervention or control group. Having 35 participants only gives a researcher a 50% probability of detecting a moderate size effect of the intervention if it is present. If a literature is generating significant moderate-sized effects more than 50% of the time with such small studies, it is seriously flawed.

Nowhere do the authors tell us if the numbers they give in this table represent patients that were randomized or patients for whom data were available at the completion of the study. Most of the studies do not have a 50:50 ratio of intervention to control participants. Was that deliberate, by design, or always that arrived at by loss of participants?

The gold standard for a RCT is an intention-to-treat analysis, in which all patients who were randomized data available for follow-up, or some acceptable procedure has been used to estimate their missing data.

It is absolutely important that meta-analyses indicate whether or not the results of the RCTs that were entered into them were from intention-to-treat analysis.

It considered a risk of bias for an RCT not to be able to provide intention-to-treat analysis.

It is absolutely required that meta-analyses provide ratings of risk of bias by any of a standard set of procedures. I was reassured when I saw that the authors of the meta-analysis stated: “The assessment of the methodological quality of each study was based on criteria established in the Cochrane Collaboration Handbook.” Yet searching, nowhere could I see further mention of these assessments or how they had been used in the meta-analysis, if at all. I was left puzzling. Did the authors not do such risk-of-bias assessments, despite having said they had conducted them? Did they for some reason leave them out of the paper? Why didn’t an editor or reviewer catch this discrepancy?

gavel_side_md_clr-gifc200Okay, case closed. I don’t recommend giving serious consideration to a meta-analysis that depends so heavily on small studies. I don’t recommend giving serious consideration to a meta-analysis that does not take risk of bias into account, particularly when there is some concern that the studies available may not be of the best quality. Readers are welcome to complain to me that I been too harsh in evaluating the study. However, the authors are offering policy recommendations, claiming the authority of a meta-analysis, and they have not made a convincing case that the literature was appropriate or that their analyses were appropriate.

I’m sorry, but it is my position that people publishing papers in making any sort of claims have a responsibility to know what they are doing. If they don’t know, they should be doing something else.

And I don’t know why the editor or reviewers did not catch the serious problems. Journal of Affective Disorders is a peer-reviewed, Elsevier journal. Elsevier is a hugely profitable publishing company, which justifies the high cost of subscriptions because of the assurance that it gives of quality peer review. But this is not the first time that Elsevier has let us down.

Mindfulness-based stress reduction to improve the mental health of breast cancer patients

A systematic review and meta-analysis claimed a moderate-to large-effect of mindfulness-based stress reduction (MBSR) among breast cancer patients for perceived stress, depression, and anxiety.

  • The article recommended MBSR be considered an “option as part of their rehabilitation to help maintain a better quality of life in the longer term.”
  • I screened the article and concluded that its conclusions were biased and estimates of the efficacy of MBSR were likely inflated. The quick exercise demonstrates tools that readers can readily apply for themselves to other meta-analyses and particularly to meta-analyses of mindfulness-based treatments, which are prone to low-quality you can read more about some tips for screening out bad meta-analyses from further consideration here.
  • This exercise adds to the weight of concerns that we cannot trust the “scientific” mindfulness literature. We need the literature to be scrutinized by researchers not making money or having investment in the promotion of mindfulness-based treatments.

The article appears in a pay walled journal but available through a repository identified on Google Scholar. According to Google Scholar the 2013 meta-analyses has already been signed an impressive 90 times.

 Zainal NZ, Booth S, Huppert FA. The efficacy of mindfulness‐based stress reduction on mental health of breast cancer patients: a meta‐analysis. Psycho‐Oncology. 2013 Jul 1;22(7):1457-65.

The abstract announces its conclusions:

On the basis of these findings, MBSR shows a moderate to large positive effect size on the mental health of breast cancer patients and warrants further systematic investigation because it has a potential to make a significant improvement on mental health for women in this group.

But the abstract also disclosed a paucity of data on which this conclusion was based:

Nine published studies (two randomised controlled trials, one quasi-experimental case–control study and six one-group, pre-intervention and post-intervention studies) up to November 2011 that fulfilled the inclusion criteria were analysed. The pooled effect size (95% CI) for MBSR on stress was 0.710 (0.511–0.909), on depression was 0.575 (0.429–0.722) and on anxiety was 0.733 (0.450–1.017).

I was skeptical already. We know from the comprehensive US Agency for Healthcare Research and Quality (AHRQ) report on mindfulness, Meditation Programs for Psychological Stress and Well-Being,that there are thousands of studies of mindfulness-based treatments, but few that are of adequate sample size and methodological quality. The exhaustive search produced 18,753 citations, but only 47 randomized controlled trials (RCTs; 3%) that included an active control treatment.

  1. Is the meta-analysis limited to RCTs? The answer should be “Yes, of course.” but the answer is “No.” Only a minority of the studies (2) are RCTs.

RCTs are preferred for evaluating psychological interventions over other evaluations of treatments .

Moreover, efforts to combine effect sizes from RCTs with those from non-RCTs are generally problematic and produce inflated estimates.

The problem with effect sizes obtained from non-RCTs is that they are likely to be exaggerated by a host of nonspecific factors. But to understand that, let’s first consider what an effect size from RCT provides.

The important principle is that treatments do not have the effect sizes, but comparisons between active treatments and control conditions occurring in RCTs do. Appropriate effect sizes obtained from an RCT are between-group differences in outcomes. A comparison control group allow some controlling for nonspecific factors and any natural improvement in outcomes that would occur with the passage of time. These are particularly important issues for studies of cancer patients, because it robust literature indicates that initial levels of psychological distress decline in the absence of treatment.

So, the within-group effect sizes available from non-RCTs can readily be adjusted and will be exaggerated estimates of the efficacy of the treatment, particularly when combined with effect sizes from RCTs.

We already know that evaluations of mindfulness-based treatments have a serious problem that control groups are typically inadequate and lead to exaggerated estimates of the efficacy of these treatments. Now these authors have compounded the problem but by combining estimates of efficacy from RCTs that are likely exaggerated with those from studies that don’t even have the benefit of between-group comparisons. The credibility of this meta-analysis is in serious jeopardy.

If I were simply searching the literature for an understanding of how effective mindfulness-based treatments are for cancer patients, I would simply move on and find another source

A broad search yielded few suitable studies.

 The authors reported systematically searching nine electronic databases using the search terms ‘mindfulness’ or ‘mindfulness-based stress reduction’ and ‘breast cancer ‘and their efforts yielded 625 studies. That’s a lot, but they were able to quickly screen out most of them (n=592) based on examining titles and abstracts. Reasons for exclusion were:

  • Not MBSR intervention (n=107)
  • MBSR mixed with other intervention (n=14)
  • Non cancer populations (n=310)
  • Commentaries or review or systematic review or meta analyses (n=133)
  • Psychometric measurement (n=28).

That left 33 articles, of which they were able to exclude 24:

  • Mixed cancer populations (n=19)
  • Not studying effect on mental health (n=2)
  • Multiple publications (n=2)

So now we’re down to 9 studies. Personally, I would excluded all but the two RCTs.

Lengacher CA, Johnson‐Mallard V, Post‐White J, Moscoso MS, Jacobsen PB, Klein TW, Widen RH, Fitzgerald SG, Shelton MM, Barta M, Goodman M. Randomized controlled trial of mindfulness‐based stress reduction (MBSR) for survivors of breast cancer. Psycho‐Oncology. 2009 Dec 1;18(12):1261-72.

This was a study comparing 40 survivors of breast cancer assigned to MBSR to 42 survivors remaining in usual care.

Henderson VP, Clemow L, Massion AO, Hurley TG, Druker S, Hébert JR. The effects of mindfulness-based stress reduction on psychosocial outcomes and quality of life in early-stage breast cancer patients: a randomized trial. Breast cancer research and treatment. 2012 Jan 1;131(1):99-109.

The second study compared three groups: 53 early-stage breast cancer patients assigned to MBSR, 52 to a nutritional education program and 58 assigned to usual care.

These two RCTs at least met my usual criteria of having 35 patients per group, which means they had better than a 50-50 chance of detecting a moderate effect if it were present. But how was methodological quality taken into account?

  1. How was the methodological quality of the studies taken into account?

It was ignored.

It is  important to consider methodological quality conducting a meta-analysis. Methodologically poor studies produce higher estimates of efficacy. We know from the report that most studies of mindfulness are of poor quality. We should be particularly concerned about whether investigators were appropriately blinded to randomization procedures so we did not influence patient assignment. We should also be concerned about whether data for all patients entering the trial were available at follow-up or that there was appropriate compensation for any loss. That would allow the gold standard intention-to-treat analyses. Particularly when conducted with cancer patients, studies often lose substantial numbers of patients to follow-up and lose any benefits of randomization.

The authors were already in trouble by including mostly nonrandomized trials, which have their own risk of bias. But the authors simply ignored any consideration of risk of bias, further damaging the credibility of their analyses.

Figure 3 of the article presents effect sizes for all nine studies included in the meta-analysis. We can see that the Lengacher et al, 2009 study did not have a significant effect on depressionor anxiety, only perceived stress. The Henderson et al, 2011 did not measure perceived stress or anxiety, only depression and the effect size was not significant.

Below, I have excerpted the display of effect sizes for perceived stress. As can be seen, the significant overall effect is driven by two small, nonrandomized trials. It’s not surprising that nonrandomized trials would appear to have larger effect sizes, given the manner in which their effect sizes are calculated.


So, we have a meta-analyses of nine studies, only two of which are RCTs. There are no ratings of methodological quality of the studies. Considering past mindfulness research, the methodological quality is expected to be poor and needs to be taken into account. Neither of the two RCTs had significant effects on the mental health outcomes and both were of at least minimally required sample size. The overall effect sizes are driven by small, underpowered, nonrandomized trials. A different conclusion would be reached by limiting consideration to the two randomized trials, but only two trials would not be a good basis for a meta-analysis

So, I’m inclined to dismiss the claims of this meta-analysis as extravagant and to excluded from further consideration. Case closed.gavel_side_md_clr-gifc200

  1. Who are the authors? Might they have undeclared conflicts of interest?

The senior author, Felicia A. Huppert, is a  Founder and Director – Well-being Institute and Emeritus Professor of Psychology at University of Cambridge, as well as a member of the academic staff of the Institute for Positive Psychology and Education of the Australian Catholic University. She is also author of the study of mindfulness training for schoolchildren that was featured in my last blog on the UK Mindful Nation report [http://blogs.plos.org/mindthebrain/2016/11/16/unintended-consequences-of-universal-mindfulness-training-for-schoolchildren/ ]. Recall that the Mindful Nation cited Professor Huppert’s study along with another one as sole support the efficacy of mindfulness training for students with “the lowest levels of executive control and emotional stability.” Yet are critical review of the study revealed that the pair of studies were methodologically poor studies with absolutely null results.

I’m frustrated repeatedly going to the literature and finding methodologically inferior mindfulness studies, which are then evaluated by merchants of mindfulness in flawed meta-analyses that conclude that mindfulness is highly effective and ready for dissemination. Schoolchildren, and in this case, cancer patients, are being misled. Clinicians and policymakers are being misled.

A high level of skepticism is warranted in approaching mindfulness literature, and glowing conclusions about its effectiveness, particularly from those having financial and professional interest in promoting mindfulness should be dismissed.

What can I (and you) do about this flawed review?

 Psycho-Oncology is the official journal of the international Psycho-Oncology Society (IPOS). It is strongly biased toward presenting a positive view of what mental health professionals can provide cancer patients, ignoring weaknesses in the evidence. I previously reported my unsuccessful complaints about a biased review claiming that psychotherapy promotes survival of cancer patients that was published without any peer review. I also reported my failed efforts to publish a letter to the editor concerning a flawed meta-analysis of couples interventions for cancer patients. As indicated in the title of the blog post, I got shushed. The letter was initially invited and accepted, and then withdrawn because of the complaints of the author. The Editor then promoted the article that I was complaining about by offering free access to what was otherwise pay wall. Finally, the Journal does not accept letters to the editor or corrective actions undertaken more than six weeks after the article has been published.

pmcommons-bannerBut there’s still something I can do, I can post a comment at PubMed Commons  detailing the shortcomings and on reliability of this meta-analysis.

I have done so and you can see the comment here. Now, when someone is doing a literature search and comes across the entry for the study on PubMed, they will be alerted that comment has been made and they can read my comment.  And you can comment yourself.

Viva post-publication peer-review that is not controlled by editors!

A systematic review of mindfulness-based stress reduction for fibromyalgia that I really like

melbourneI am preparing a keynote address, Mindfulness Training for Physical Health Problems, to deliver at the World Congress of Behavioural and Cognitive Therapies in Melbourne on Friday, June 24, 2016. I have been despairing about the quality of both the clinical trials and systematic reviews of mindfulness treatments that I have been encountering.

Mindfulness training, mindfulness-based cognitive therapy, and mindfulness-based stress reduction (MBSR) are hot topics. That means that studies get published with obvious methodological problems ignored,  and premature and exaggerated claims are rewarded. Articles are  prone to spin and confirmation bias, not only in the reporting the results of a particular study, but also in what past studies get cited or are buried,  depending on whether they support all the enthusiasm.

bandwagon1_original_original_original_crop_northIt is difficult to get a fair evidence-based appraisal of MBSR for clinicians, patients, and policy makers.  There is a bandwagon rushing far ahead of what best evidence supports. Aside from all the other problems, the literature is being hijacked by enthusiastic promoters with undisclosed conflicts of interest who hype what they don’t tell us they are offering for sale elsewhere. Clear declarations of conflicts of interest, please.

We know that MBSR is better than no treatment. But there is only weak and inconsistent evidence telling us whether MBSR is better than other active treatments delivered with the same intensity and the same positive expectations.

Most often, MBSR is compared to a waitlist control or treatment as usual. Depending on the context, we don’t know if these control groups are actually the opposite of placebos,  nocebos Patients agreed to participate in a study that gave them a chance to get MBSR. Left in the (unblinded) control condition, they got nothing except having to be assessed repeatedly. They are going to be disappointed and this reaction is going to register in the self-report outcome data received from them.

Also, the ill-described routine care or treatment as usual as being provided may be so inadequate that we are only witnessing MBSR compensating for poor quality treatment, rather than making an active contribution of its own. This was particularly true when mindfulness training was used to taper patients from antidepressants. Patients receiving MBSR and tapering were compared to patients remaining in the routine care in primary care in which they had placed on antidepressants some time ago. At the time of recruitment, many patients were simply being ignored with minimal or no monitoring of theyor whe were taking their medication or they were even still depressed. It’s not clear whether reassessing whether the medication is still being of any benefit and providing support for tapering would’ve accomplished as much or than MBSR accomplished, without requiring daily practice or a full day retreat.

Data describing treatment as usual or routine care control conditions are readily available, but almost never reported in studies of evaluating MSB are.

To take another example, when patients with chronic back pain are being recruited  from primary care, their long term routine care eventually lacks support, positive expectations, or encouragement and may become iatrogenic  because guidelines requiring escalating futile interventions. Here too, putting back in some support and realistic expectations may work as well as more complicated n interventions.

Some members of the audience in Melbourne surely anticipate a relentlessly critical perspective on mindfulness from me. They will be surprised when I present limitations of current literature, but also positive recommendations for how future studies can be improved.

We need less research evaluating MBSR, but of a better quality.

There is far too much bad mindfulness research being done and uncritically cited and being put into systematic reviews. A meta-analysis cannot overcome the limitations of individual trials, if the bulk of the studies being integrated share the same problems. Garbage in, garbage out is a bit too harsh, but communicates a valid concern.

I think it is very important that meta analysis  with a hot topic like MBSR not become overly focused on summary effect sizes. Such effect sizes are inevitably inflated because of a dependence on at best a few small studies with a high risk of bias, which includes the allegiance of overenthusiastic investigators. These effect sizes are best ignored. It is better instead to identify the gaps and limitations in the existing literature, and how they can be corrected.

Stumbling on a quality review of MBSR for fibromyalgia.

I was quite pleased to stumble upon a review and meta-analysis of MBSR for fibromyalgia. Although it is published in a pay walled journal, a PDF is available at ResearchGate.

 Lauche R, Cramer H, Dobos G, Langhorst J, Schmidt S. A systematic review and meta-analysis of mindfulness-based stress reduction for the fibromyalgia syndrome. Journal of Psychosomatic Research. 2013 Dec 31;75(6):500-10.

Here’s the abstract:

Objectives: This paper presents a systematic review and meta-analysis of the effectiveness of mindfulness-based stress reduction (MBSR) for FMS.

Methods: The PubMed/MEDLINE, Cochrane Library, EMBASE, PsychINFO and CAMBASE databases were screened in September 2013 to identify randomized and non-randomized controlled trials comparing MBSR to control interventions. Major outcome measures were quality of life and pain; secondary outcomes included sleep quality, fatigue, depression and safety. Standardized mean differences and 95% confidence intervals were calculated.

Results: Six trials were located with a total of 674 FMS patients. Analyses revealed low quality evidence for shortterm improvement of quality of life (SMD=−0.35; 95% CI−0.57 to−0.12; P=0.002) and pain (SMD=−0.23; 95% CI −0.46 to −0.01; P=0.04) after MBSR, when compared to usual care; and for short-term improvement of quality of life (SMD=−0.32; 95% CI −0.59 to −0.04; P=0.02) and pain (SMD=−0.44; 95% CI −0.73 to −0.16; P=0.002) after MBSR, when compared to active control interventions. Effects were not robust against bias. No evidence was further found for secondary outcomes or long-term effects of MBSR. Safety data were not reported in any trial.

Conclusions: This systematic review found that MBSR might be a useful approach for FMS patients. According to the quality of evidence only a weak recommendation for MBSR can be made at this point. Further high quality RCTs are required for a conclusive judgment of its effects.

I will be blogging about MBSR for fibromyalgia in the future, but now I simply want to show off the systematic review and meta-analysis and point to some of its unusual strengths.

fibromyeliaA digression: What is fibromyalgia?

Fibromyalgia syndrome is a common and chronic disorder characterized by widespread pain, diffuse tenderness, and a number of other symptoms. The word “fibromyalgia” comes from the Latin term for fibrous tissue (fibro) and the Greek ones for muscle (myo) and pain (algia).

Although fibromyalgia is often considered an arthritis-related condition, it is not truly a form of arthritis (a disease of the joints) because it does not cause inflammation or damage to the joints, muscles, or other tissues. Like arthritis, however, fibromyalgia can cause significant pain and fatigue, and it can interfere with a person’s ability to carry on daily activities. Also like arthritis, fibromyalgia is considered a rheumatic condition, a medical condition that impairs the joints and/or soft tissues and causes chronic pain.

criteria fibromyalgiaYou can find out more about fibromyalgia from a fact sheet from the US National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMSD)  that is only minimally contaminated by outdated notions of fibromyalgia being a psychosomatic condition, i.e., all in the head, or recommendations for unproven complementary and alternative medicines.

 What I like about this systematic review and meta-analysis.

 The authors convey familiarity with the standards for conducting and reporting systematic reviews and meta-analyses, recommendations for the grading of evidence, and guidelines specific to the particular topic, fibromyalgia. They also admit that they had not registered their protocol. No one is perfect, and it is important for authors to indicate that they are aware of standards, even when they do not meet them. Readers can decide for themselves how to take this into account.

This review was planned and conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses. guidelines (PRISMA) [15], the recommendations of the Cochrane Musculoskeletal Group [16,17] and the GRADE recommendations (Grading of Recommendations Assessment, Development and Evaluation) [18]. The protocol was not registered in any database.

The authors also laid out key features of systematic review and meta-analyses where you would expect to find them with explicit headings: eligibility criteria, search strategy, study selection and data collection including risk of bias in individual studies, etc.

Designation of primary and secondary outcomes.

Fibromyalgia causes pain and fatigue, disrupting quality of life. These are the outcomes in which patients and their healthcare providers will be most interested. Improvement in pain should be given the priority. However, in clinical trials of MBSR for fibromyalgia ,investigators often administer a full battery of measures, and select the ones that are positive, even if they’re not the outcomes that will be most important to patients and providers. For instance, the first report from one trial focused on depressive symptoms . Designating depressive symptoms as the primary outcome ignored that  not all patients with fibromyalgia have heightened depressive symptoms, and depression is not their primary concern. Moreover, the paper reporting this clinical trial is inconsistent with  its regiistration, where a full range of other outcomes were designated as primary. Ugh, such papers defeat the purpose of having their protocols registered.

In the review under discussion, depressive symptoms were designated as a secondary outcome, along with sleep and fatigue.

 Compared to what?

The review clearly distinguished waitlist/routine care from active comparison treatments and provide separate effect sizes.

The review also indicated whether the patients had been randomized to MBSR versus comparison treatment, and explicitly indicated that any significant effects for MBSR are disappeared when only randomized trials were considered.

 Strength of recommendation.

The review took into account the small number of studies (4 randomized and 2 non-randomized trials  with a total of 674 patients) and the low quality of evidence in grading the recommendation that it was making:

According to GRADE, only a weak recommendation could be made for the use of MBSR for FMS, mainly due to the small number of studies and low quality of evidence.

 Summary of main results.

The article produces a series of forest plots [How to read one ] that graphically display the unambiguous results showing weak effects of mindfulness in the short-term but none in the long term. For instance:

pain short and long

This meta-analysis found low quality evidence for small effects of MBSR on quality of life and pain intensity in patients with fibromyalgia syndrome, when compared to usual care control groups or active control groups. Effects however were not robust against bias. Finally, data on safety were not reported in any study.

 Agreements and disagreements with other systematic reviews.

The few other reviews of MBSR fibromyalgia are of poor quality. So, the authors of  this review discusses results in the context of the larger literature of MBSR for physical health problems.

 Implication for further research.

Too often, reviews of fashionable psychological interventions for health problems end with an obligatory positive assessment and, of course, “further research is needed.”

Enthusiasts assume MSBR is that it is good for whatever ails you. MBSR training can help you cope, if it doesn’t actually address your physical health problem. I really liked that this review gave pause and reflected on why MBSR should be expected to be the treatment of choice and to make sure that relevant process and outcome variables are being assessed.

Patients with fibromyalgia are seek to relieve their debilitating pain and accompanying fatigue, or at least resume some semblance of a normal life they have lost their condition. It is important that results of MBSR research allow informed decisions about whether it is worth the effort to patients and providers to get involved in MBSR or whether it would simply be more more burden with uncertain  results.

One major implication for future research is that researchers should bear in mind that MBSR primarily aims to establish a mindful and accepting pain coping style rather than to reduce the intensity of pain or other complaints. Therefore researchers are encouraged to select custom outcomes such as awareness, acceptance or coping rather than intensity of symptom which might not reflect the intention of the intervention. Only two trials measured coping, however, only one of them actually reported results and the other one [47] did not provid data but stated that besides catastrophizing there were no significant group differences. Results of the trial by Grossmann et al. [49] on the other hand indicated significant improvements on several subscales, which could be worth further investigations.

Further high quality RCTs comparing MBSR to established therapies
(e.g. defined drug treatment, cognitive behavioral therapy) are also required
for the conclusive judgment.


This systematic review found low quality evidence for a small short term improvement of pain and quality of life after MBSR for fibromyalgia, when compared to usual care or active control interventions. No evidence was found for long-term effects.

Not much spin here or basis for yet recommending MBSR for fibromyalgia  as ready for  implementing in routine care.


Probing an untrustworthy Cochrane review of exercise for “chronic fatigue syndrome”

Updated April 24, 2016, 9:21 AM US Eastern daylight time: An earlier version of this post had mashed together discussion of the end-of-treatment analyses with the follow-up analyses. That has now been fixed. The implications are even more serious for the credibility of this Cochrane review.

From my work in progress

worse than

My ongoing investigation so far has revealed that a 2016 Cochrane review misrepresents how the review was done  and what was found in key meta analyses. These problems are related to an undeclared conflict of interest.

The first author and spokesperson for the review, Lillebeth Larun is also the first author on the protocol for a Cochrane review that has not yet been published.

Larun L, Odgaard-Jensen J, Brurberg KG, Chalder T, Dybwad M, Moss-Morris RE, Sharpe M, Wallman K, Wearden A, White PD, Glasziou PP. Exercise therapy for chronic fatigue syndrome (individual patient data) (Protocol). Cochrane Database of Systematic Reviews 2014, Issue 4. Art. No.: CD011040.

At a meeting organized and financed by PACE investigator Peter White, Larun obtained privileged access to data that the PACE investigators have spent tens of thousands of pounds to keep most of us from viewing. Larun used this information to legitimize outcome switching or p-hacking favorable to the PACE investigators’ interests. The Cochrane review  misled readers in presenting how some analyses were conducted that were crucial to its conclusions.

One of the crucial function of Cochrane reviews is to protect policymakers, clinicians, researchers, and patients from the questionable research practices utilized by trial investigators to promote particular interpretation of their results. This Cochrane review fails miserably in this respect. The Cochrane is complicit in endorsing the PACE investigators’ misinterpretation of their findings.

A number of remedies should be implemented. The first could be for Cochrane Editor in Chief and Deputy Chief Director Dr. David Tovey to call publicly for release for independent reanalysis of the PACE trial data from The Lancet original outcomes paper and the follow-up data reported in Lancet Psychiatry.

Given the breach in trust with the readership of Cochrane that has occurred, Dr. Tovey should announce that the individual patient-level data used in the ongoing review will be released for independent re-analysis.

Larun should be removed from the Cochrane review that is in progress. She should recuse herself from further comment on the 2016 review. Her misrepresentations and comments thus far have tarnished the Cochrane’s reputation for unbiased assessment and correction when mistakes are made.

An expression of concern should be posted for the 2016 review.

The 2016 Cochrane review of exercise for chronic fatigue syndrome:

 Larun L, Brurberg KG, Odgaard-Jensen J, Price JR. Exercise therapy for chronic fatigue syndrome. Cochrane Database Syst Rev. 2016; CD003200.

Added only three studies that were not included in a 2004 Cochrane review of five studies:

Wearden AJ, Dowrick C, Chew-Graham C, Bentall RP, Morriss RK, Peters S, et al. Nurse led, home based self help treatment for patients in primary care with chronic fatigue syndrome: randomised controlled trial. BMJ 2010; 340 (1777):1–12. [DOI: 10.1136/bmj.c1777]

Hlavaty LE, Brown MM, Jason LA. The effect of homework compliance on treatment outcomes for participants with myalgic encephalomyelitis/chronic fatigue syndrome. Rehabilitation Psychology 2011;56(3):212–8.

White PD, Goldsmith KA, Johnson AL, Potts L, Walwyn R, DeCesare JC, et al. Comparison of adaptive pacing therapy, cognitive behaviour therapy, graded exercise therapy, and specialist medical care for chronic fatigue syndrome (PACE): a randomised trial. The Lancet 2011; 377:611–90.

This blog post concentrates on sub analyses that is crucial to the conclusions of the 2016 review reported on pages  68 and 69, Analyses 1.1 and 1.2.

I welcome others to extend this scrutiny to other analyses in the review, especially those for the SF-36 (parallel Analyses 1.5 and 1.6).

Analysis 1.1. Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 1 Fatigue (end of treatment).

The only sub analysis that involves new studies includes Wearden et al. FINE trial, White et al. PACE trial and an earlier study, Powell et al. The meta-analysis gives 27.2% weight to Wearden et al and 62.9% weight to White et al.or a 90.1% weight to the pair.

 Inclusion of the Wearden et al FINE trial in the meta-analysis

The Cochrane review evaluates risk of bias for Wearden et al. on page 49:

Wearden selective reporting

This is untrue.

Cochrane used a ‘Likert’ scoring method (0,1,2,3), but  the original Wearden et al. paper reports using the…

11 item Chalder et al fatigue scale,19 where lower scores indicate better outcomes. Each item on the fatigue scale was scored dichotomously on a four point scale (0, 0, 1, or 1).

This would seem a trivial difference, but this outcome switching will take on increasing importance as we proceed.

Based on a tip from Robert Courtney. I found the first mention of a re-scoring of the Chalder fatigue scale in the Weardon  study in a BMJ Rapid Response:

 Wearden AJ, Dowrick C, Chew-Graham C, Bentall RP, Morriss RK, Peters S, et al. Nurse led, home based self help treatment for patients in primary care with chronic fatigue syndrome: randomised controlled trial. BMJ, Rapid Response 27 May 2010.

The explanation that was offered for the re-scoring in the Rapid Response was:

Following Bart Stouten’s suggestion that scoring the Chalder fatigue scale (1) 0123 might more reliably demonstrate the effects of pragmatic rehabilitation, we recalculated our fatigue scale scores.

“Might reliably demonstrate…”?  Where I come from, we call this outcome switching,  p-hacking, a questionable research practice, or simply cheating.

In the original reporting of the trial, effects of exercise were not significant at follow-up. With the rescoring of the Chalder fatigue scale, these results now become significant.

A  physician who suffers from myalgic encephalomyelitis (ME) – what both the PACE investigators and Cochrane review term “chronic fatigue syndrome” – sent me the following comment:

I have recently published a review of the PACE trial and follow-up articles and according to the Chalder Fatigue Questionnaire, when using the original bimodal scoring I only score 4 points, meaning I was not ill enough to enter the trial, despite being bedridden with severe ME. After changing the score in the middle of the trial to Likert scoring, the same answers mean I suddenly score the minimum number of 18 to be eligible for the trial yet that same score of 18 also meant that without receiving any treatment or any change to my medical situation I was also classed as recovered on the Chalder Fatigue Questionnaire, one of the two primary outcomes of the PACE trial.

So according to the PACE trial, despite being bedridden with severe ME, I was not ill enough to take part, ill enough to take part and recovered all 3 at the same time …

Yet according to Larun et al. there’s nothing wrong with the PACE trial.

Inclusion of the White et al PACE trial in the meta-analysis

Results of the Wearden et al FINE trial were available to the PACE investigators when they performed the controversial switching  of outcomes for their trial. This should be taken into account in interpreting Larun’s defense of the PACE investigators in response to a comment from Tom Kindlon. She stated:

 You particularly mention the risk of bias in the PACE trial regarding not providing pre-specified outcomes however the trial did pre-specify the analysis of outcomes. The primary outcomes were the same as in the original protocol, although the scoring method of one was changed and the analysis of assessing efficacy also changed from the original protocol. These changes were made as part of the detailed statistical analysis plan (itself published in full), which had been promised in the original protocol. These changes were drawn up before the analysis commenced and before examining any outcome data. In other words they were pre-specified, so it is hard to understand how the changes contributed to any potential bias.

I think that what we have seen here so far gives us good reason to side with Tom Kindlon versus Lillebeth Larun on this point.

Also relevant is an excellent PubMed Commons comment by Sam Carter, Exploring changes to PACE trial outcome measures using anonymised data from the FINE tria. His observations about the Chalder fatigue questionnaire:

White et al wrote that “we changed the original bimodal scoring of the Chalder fatigue questionnaire (range 0–11) to Likert scoring to more sensitively test our hypotheses of effectiveness” (1). However, data from the FINE trial show that Likert and bimodal scores are often contradictory and thus call into question White et al’s assumption that Likert scoring is necessarily more sensitive than bimodal scoring.

For example, of the 33 FINE trial participants who met the post-hoc PACE trial recovery threshold for fatigue at week 20 (Likert CFQ score ≤ 18), 10 had a bimodal CFQ score ≥ 6 so would still be fatigued enough to enter the PACE trial and 16 had a bimodal CFQ score ≥ 4 which is the accepted definition of abnormal fatigue.

Therefore, for this cohort, if a person met the PACE trial post-hoc recovery threshold for fatigue at week 20 they had approximately a 50% chance of still having abnormal levels of fatigue and a 30% chance of being fatigued enough to enter the PACE trial.

A further problem with the Chalder fatigue questionnaire is illustrated by the observation that the bimodal score and Likert score of 10 participants moved in opposite directions at consecutive assessments i.e. one scoring system showed improvement whilst the other showed deterioration.

Moreover, it can be seen that some FINE trial participants were confused by the wording of the questionnaire itself. For example, a healthy person should have a Likert score of 11 out of 33, yet 17 participants recorded a Likert CFQ score of 10 or less at some point (i.e. they reported less fatigue than a healthy person), and 5 participants recorded a Likert CFQ score of 0.

The discordance between Likert and bimodal scores and the marked increase in those meeting post-hoc recovery thresholds suggest that White et al’s deviation from their protocol-specified analysis is likely to have profoundly affected the reported efficacy of the PACE trial interventions.

Compare White et al.’s “more sensitively test our hypotheses” to Weardon et al.’s ““might reliably demonstrate…” explanation for switching outcomes.

A correction is needed to this assessment of risk of bias in the review for the White et al PACE trial.white study bias

A figure on page 68 shows results of a subanalysis with the switched outcomes at the end of treatment.

analysis 1.1 end of treatment

This meta analyses concludes that exercise therapy produced an almost 3 point drop in fatigue on the rescored Chalder scale at the end of treatment.

Analysis 1.2. Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 2 Fatigue (follow-up).

A table on page 69 shows results of a subanalysis with the switched outcomes at follow up:

analyses 1.2 follow up

This meta analysis entirely depends on the revised scoring of the Chalder fatigue scale and the FINE and PACE trial. It suggests that the three point drop in fatigue persists at followup.

But Cochrane should have stuck with the original primary outcomes specified in the original trial registrations. That would have been consistent what with the Cochrane usually does, what is says it did here,  and what its readers expect.

Readers were not at the meeting that the PACE investigators financed and cannot get access to the data on which the Cochrane review depends. So they depend on Cochrane as a trusted source.

I am sure the results would be different if the expected and appropriate procedures had been followed. Cochrane should alert readers with an Expression of Concern until the record can be corrected or the review retracted.

 Now what?

get out of bedIs it too much to ask that Cochrane get out of bed with the PACE investigators?

What would Bill Silverman say? Rather than speculate about someone who neither Dr.Tovey or I have ever met, I ask Dr Tovey “What would Lisa Bero say?”


My response to an invitation to improve the Cochrane Collaboration by challenging its policies

I interpret a recent Cochrane Community Blog post as inviting me to continue criticizing the Collaboration’s conflict of interest in the evaluation of “chronic fatigue syndrome” with the intent of initiating further reflection on its practices and change.

Cochrane needs to

  • Clean up conflicts of interest in its systematic reviews.
  • Issue a Statement of Concern about a flawed and conflicted review of exercise for chronic fatigue syndrome.

cochrane communityI will leave for a future blog the argument that Cochrane needs to take immediate steps to get the misnamed “chronic fatigue syndrome” out of its Common Mental Disorders group. The colloquialism throws together highly prevalent complaints in primary care of tiredness with less common, but more serious myalgic encephalomyelitis, which is recognized by the rest of the world as a medical  condition, not a mental disorder.

But I think I call attention in this blog post to enough that needs change now.

The invitation from the Cochrane Community Blog to criticize its policies

I had a great Skype conference with Dr. David Tovey, Cochrane Editor in Chief and Deputy Chief Director. I’m grateful for his reaching out and his generous giving of his time, including reading my blog posts ahead of time.

In the email setting up the conversation, Dr.Tovey stated that Cochrane has a tradition of encouraging debate and that he believes that criticism helps them to improve. That is something he is very keen to foster.

Our conversation was leisurely and wide-ranging. Dr.Tovey lived up to the expectations established in his email. He said that he was finishing up a blog post in response to issues that I and others had raised. That blog post is now available here. It leads off with:

 I didn’t know Bill Silverman, so I can’t judge whether he would be “a-mouldering in his grave”. However, I recognise that James Coyne has set down a challenge to Cochrane to explain its approach to commercial and academic conflicts of interest and also to respond to criticisms made in relation to the appraisal of the much debated PACE study.

Dr. Tovey closed his blog post with:

 Cochrane is not complacent. We recognise that both we and the world we inhabit are imperfect and that there is a heavy responsibility on us to ensure that our reviews are credible if they are to be used to guide decision making. This means that we need to continue to be responsive and open to criticism, just as the instigators of the Bill Silverman prize intended, in order “to acknowledge explicitly the value of criticism of The Cochrane Collaboration, with a view to helping to improve its work.”

 As a member of a group of authors who received the Bill Silverman prize, I am interpreting Dr. Tovey’s statement as an invitation to improve the Cochrane collaboration by instigating and sustaining a discussion of its handling of conflicts of interest in reviews of the misnamed “chronic fatigue syndrome.”

I don’t presume that Dr. Tovey will personally respond to all of my efforts. But I will engage him and hope that my criticisms and concerns will be forwarded to appropriate deliberative bodies and receive wider discussion within the Cochrane.

For instance, I will follow up on his specific suggestion by filing a formal complaint with Funding Arbiters and Arbitration Panel concerning a review and protocol with Lillebeth Larun as first author.

 A flawed and conflicted Cochrane systematic review

 There are numerous issues that remain unresolved in a flawed and conflicted recent Cochrane systematic review:

 Larun L, Brurberg KG, Odgaard-Jensen J, Price JR. Exercise therapy for chronic fatigue syndrome. Cochrane Database Syst Rev. 2016; CD003200.

As well as a protocol for a future review:

Larun L, Odgaard-Jensen J, Brurberg KG, Chalder T, Dybwad M, Moss-Morris RE, Sharpe M, Wallman K, Wearden A, White PD, Glasziou PP. Exercise therapy for chronic fatigue syndrome (individual patient data) (Protocol). Cochrane Database of Systematic Reviews 2014, Issue 4. Art. No.: CD011040.

I’m pleased that Dr. Tovey took a stand against the PACE investigators and Queen Mary University, London. He agreed sharing patient-level data for a Cochrane Review on which they were authors should not be used as an excuse to avoid sharing data with others. .

 Another issue raised by Coyne has also been raised with me in personal correspondence: namely the perceived use of Cochrane as a rationale for withholding clinical trials data at the level of individual patients from other individuals and organisations. Cochrane is a strong supporter and founding member of the AllTrials initiative and is committed to clinical trials transparency. Cochrane does not believe that sharing data with its researchers is an appropriate rationale for withholding the data from alternative researchers. Each application must be judged independently on its merits. Cochrane has issued a public statement that details our position on access to trial data.

I hope that Dr.Tovey’s sentiment was formally communicated to the Tribunal deliberating an appeal by the PACE investigators of a decision by the UK Information Commissioner that the trial data must be released to someone who had requested it.

I also hope that Dr. Tovey and the Cochrane recognize the implications of the PACE investigators thus far only being willing to share their data when they have authorship and therefore some control over the interpretation of their data.  As Dr.Tovey notes, simply providing data does not meet the conditions for authorship:

 It is also important that all authors within a review team meet the requirements of the International Committee of Medical Journal Editors (ICMJE) in relation to authorship.

These requirements mean that all authors must approve the final version of the manuscript before it is submitted. This allows the PACE investigators to control the conclusions of the systematic review so that they support the promotion of cognitive behavior and graded exercise therapy as the most evidence-supported treatments for chronic fatigue syndrome.

A favorable evaluation by the Cochrane will greatly increase the value of the PACE group’s consultations, including  recommendations that disabled persons be denied benefits if they do not participate in these “best-evidence”interventions.

I’m pleased that Dr. David Tovey reiterated the Cochrane’s strong position on disclosures of conflict of interest being necessary but not sufficient to ensure the integrity of systematic reviews:

 Cochrane is still fairly unusual within the journal world in that it specifies that in some cases declaration of interests is necessary but insufficient, and that there are individuals or groups of researchers who are not permitted to proceed with a given systematic review.

Yet, I’m concerned that in considering the threat of disclosed and undisclosed conflicts of interest, Dr. Tovey and the Cochrane narrowly focus on Pharma and medical device manufacturers, to the exclusion of other financial ties, such as the large disability re-insurance industry:

 Within the 2014 policy it was made explicit that review authors could not be employed by pharmaceutical companies, device manufacturers or individuals that were seeking or holding a patent relevant to the intervention or a comparator product. Furthermore, in all cases, review author teams are required to have a majority of non-conflicted authors and the lead author should also be non-conflicted. The policy is available freely.

[The Cochrane apparently lacks an appreciation of the politics and conflicts of interest of the PACE trial. The trial has the unusual if not unique distinction of being a psychotherapy trial funded in part by the UK Department of Work and Pensions, which had a hand in its design. It’s no accident that the PACE investigators include paid consultants to the re-insurance industry. For more on this mess, see The Misleading Research at the Heart of Disability Cuts.

nothing to declareIt also doesn’t help that the PACE investigators routinely fail to declare conflicts of interest. They failed to disclose their conflicts of interest to patients being recruited for the study. They failed again until they were caught in declaring no conflicts of interest in a protocol for another systematic review.

Dr. Tovey goes on to state:

Authors of primary studies should not extract data from their own study or studies. Instead, another author(s) or an editor(s) should extract these data, and check the interpretation against the study report and any available study registration details or protocol.

The  Larun et al systematic review of graded exercise therapy violates this requirement.  The meta-analyses forming the basis of this review is not reproducible from the published registrations, original protocols, and findings of the original studies.

Dr. Tovey is incorrect on one point:

 James Coyne states that Lillebeth Larun is employed by an insurance company, but I am unclear on what basis this is determined. Undeclared conflicts of interest are a challenge for all journals, but when they are brought to our attention, they need to be verified. In any case, within Cochrane it would be a matter for the Funding Arbiters and Arbitration Panel to determine whether this was a sufficiently direct conflict to disbar her from being first author of any update.

I can’t find anywhere that I have said that Lillebeth Larun is employed by an insurance company. But I did say that she has undeclared conflicts of interest.  These echo in her distorted judgments and defensive responses to criticisms of decisions made in the review that favor the PACE investigators’ vested interest.

Accepting  Dr. Toby’s suggestion, I’ll be elaborating my concerns in a formal complaint to Cochrane’s Funding Arbiters and Arbitration Panel. But here is a selection of what I previously said:

Larun dismisses the risk of bias associated with the investigators not sticking to the primary outcomes in their original protocol. She suggested deviations from these outcomes were specified before analyses commenced. However, this was an unblinded trial and the investigators could inspect incoming data. In fact, they actually sent out a newsletter to participants giving testimonials about the benefits of the trial while they were still recruiting patients. Think of it: if someone with ties to the pharmaceutical industry could peek at incoming data and make changes to designate outcomes, wouldn’t that be a high risk of bias? Of course.

Laurun was responding to an excellent critique of the published review by Tom Kindlon, which you can find here.

Other serious problems with the review are hidden from the casual reader. In revising their primary outcomes specified in the original proposal, the PACE investigators had access to the publicly available data from the sister FINE trial (Weardon, 2010).

 Wearden AJ, Dowrick C, Chew-Graham C, Bentall RP, Morriss RK, Peters S, Riste L, Richardson G, Lovell K, Dunn G; Fatigue Intervention by Nurses Evaluation (FINE) trial writing group and the FINE trial group. Nurse led, home based self help treatment for patients in primary care with chronic fatigue syndrome: randomised controlled trial. BMJ. 2010 Apr 23;340:c1777. doi: 10.1136/bmj.c1777.

These data from the FINE trial clearly indicated that the existing definition of the primary outcomes in the PACE trial registration would likely not provide evidence of the efficacy of cognitive behavior or graded exercise therapy. Not surprisingly, the PACE investigators revised their scoring of primary outcomes.

Moreover, the Larun et al review misrepresents how effect sizes for the FINE trial were calculated. The review wrongly claimed that only protocol-defined and published data or outcomes were used for analysis of the Wearden 2010 study.

Robert Courtney documents in a pending comment that the review relied on an alternative unpublished set of data. As Courtney points out, the differences are not trivial.

Yet, the risk of bias table in the review for the Wearden study states:

Wearden selective reporting

Financial support for a meeting between Dr. Lillebeth Larun and PACE investigators

The statement of funding for the 2014 protocol indicates that Peter White financed meetings at Queen Mary University in 2013. If this were a Pharma-supported 2016 systematic review, wouldn’t Lauren have to disclose a conflict of interest for attendance at the the 2014 meeting sponsored by PACE investigators?

Are these meetings the source of the acknowledgment in the 2016 systematic review?

We would like to thank Peter White and Paul Glasziou for advice and additional information provided. We would also like to thank Kathy Fulcher, Richard Bentall, Alison Wearden, Karen Wallman and Rona Moss-Morris for providing additional information from trials in which they were involved.

The declared conflicts of interest of the PACE investigators in The Lancet paper constitute a high risk of bias. I am familiar with this issue because our article which won the Bill Silverman Award highlighted the importance of authors’ conflicts of interest being associated with exaggerated estimates of efficacy. The award to us was premised on our article having elicited a change in Cochrane policy. My co-author Lisa Bero wrote an excellent follow-up editorial for Cochrane on this topic.

 This is a big deal and action is needed

 Note that this 2016 systematic review has only three new studies considered that were not included in the 2004 review. So, the misrepresentations and incorrect calculation of effect sizes for two  added trials– PACE and FINE – are decisive.

As it stands, the Larun et al Cochrane Review is an unreliable summary of the literature concerning exercise for “chronic fatigue syndrome.”  Policymakers, clinicians, and patients should be warned. It serves the interests of politicians and re-insurance companies–and declared and undeclared interest of the PACE investigators.

I would recommend that Dr. Lillebeth Larun recuse herself from further commentary on the 2016 systematic review until complaints about her conflicts of interest and unreproducibility of the review are resolved. The Cochrane should also publish an Expression of Concern about the review, detailing the issues that have been identified here.

Stay tuned for a future blog post concerning the need to move “chronic fatigue syndrome” out of the Cochrane Common Mental Disorders group.



Why the Cochrane Collaboration needs to clean up conflicts of interest

  • A recent failure to correct a systematic review and meta-analysis demonstrates that Cochrane’s problem with conflict of interest is multilayered.
  • Cochrane  enlists thousands of volunteers committed to the evaluation of evidence independent of the interests of the investigators who conducted trials.
  • Cochrane is vigilant in requiring declaration of conflicts of interest but is inconsistent in policing their influence on reviews.
  • The Cochrane  has a mess to clean up.

ioannidisA recent interview of John Ioannidis by Retraction Watch expressed concern about Cochrane’s tainting by conflict of interest:

RW: You’re worried that Cochrane Collaboration reviews — the apex of evidence-based medicine — “may cause harm by giving credibility to biased studies of vested interests through otherwise respected systematic reviews.” Why, and what’s the alternative?

JI: A systematic review that combines biased pieces of evidence may unfortunately give another seal of authority to that biased evidence. Systematic reviews may sometimes be most helpful if, instead of focusing on the summary of the evidence, highlight the biases that are involved and what needs to be done to remedy the state-of-the-evidence in the given field. This often requires a bird’s eye view where hundreds and thousands of systematic reviews and meta-analyses are examined, because then the patterns of bias are much easier to discern as they apply across diverse topics in the same or multiple disciplines. Much of the time, the solution is that, instead of waiting to piece together fragments of biased evidence retrospectively after the fact, one needs to act pre-emptively and make sure that the evidence to be produced will be clinically meaningful and unbiased, to the extent possible. Meta-analyses should become primary research, where studies are designed with the explicit anticipation that they are part of an overarching planned cumulative meta-analysis.

The key points were (1) Retraction Watch is raising with John Ioannidis the concern that evidence-based medicine has been hijacked by special interest; (2) RW is specifically asking about the harm caused by the Cochrane Collaboration in lending undue credibility to studies biased by vested interest; and (3) Ioannidis replies that instead of focusing on summarizing the evidence, Cochrane should highlight biases and point to what needs to be done to produce trustworthy, clinically meaningful and unbiased assessment.

A recent exchange of comments about a systematic review and meta-analysis demonstrates the problem.

Larun L, Brurberg KG, Odgaard-Jensen J, Price JR. Exercise therapy for chronic fatigue syndrome. Cochrane Database Syst Rev. 2016; CD003200.

cochraneThe systematic review is behind a paywall. That is particularly unfortunate because persons providing systematic reviews undergo extensive training and then work for free. The fruits of their labor identifying would best be evidence should be available around the world, for free. But to see their work, one has to either go through a University library or pay a fee to the for-profit Wiley.

An abridged version of the review is available here:

Larun L, Odgaard-Jensen J, Price JR, Brurberg KG. An abridged version of the Cochrane review of exercise therapy for chronic fatigue syndrome. European Journal of Physical and Rehabilitation Medicine. 2015 Sep.

To get around the pay wall of the full review, the commentator, Tom Kindlon cleverly reposted his comment at PubMed Commons where everybody can access it for free:

In his usual polite style, Mr Kindlon opens with an expression thanks the authors of the systematic review and closes with a thanks for their reading his comments. In between, he makes a number of interesting points before getting to the following:

“Selective reporting (outcome bias)” and White et al. (2011)

I don’t believe that White et al. (2011) (the PACE Trial) (3) should be classed as having a low risk of bias under “Selective reporting (outcome bias)” (Figure 2, page 15). According to the Cochrane Collaboration’s tool for assessing risk of bias (21), the category of low risk of bias is for: “The study protocol is available and all of the study’s pre-specified (primary and secondary) outcomes that are of interest in the review have been reported in the pre-specified way”. This is not the case in the PACE Trial. The three primary efficacy outcomes can be seen in the published protocol (22). None have been reported in the pre-specified way. The Cochrane Collaboration’s tool for assessing risk of bias states that a “high risk” of bias applies if any one of several criteria are met, including that “not all of the study’s pre-specified primary outcomes have been reported” or “one or more primary outcomes is reported using measurements, analysis methods or subsets of the data (e.g. subscales) that were not pre-specified”. In the PACE Trial, the third primary outcome measure (the number of “overall improvers”) was never published. Also, the other two primary outcome measures were reported using analysis methods that were not pre-specified (including switching from the bimodal to the Likert scoring method for The Chalder Fatigue Scale, one of the primary outcomes in your review). These facts mean that the “high risk of bias” category should apply.

I’m sure John Ioannidis would be pleased with Kindlon raising this point.

In order to see the response from the author of the systematic review one has to get behind the paywall. If you do that, you can see that Lillebeth Larun reciprocates Kindlon’s politeness, agrees that some of his points should be reflected in future research, but takes issue with a key one. I haven’t asked him, but I don’t think John Ioannidis is would be happy with her response:

Selective reporting (outcome bias)

The Cochrane Risk of Bias tool enables the review authors to be transparent about their judgments, but due to the subjective nature of the process it does not guarantee an indisputable consensus. You particularly mention the risk of bias in the PACE trial regarding not providing pre-specified outcomes however the trial did pre-specify the analysis of outcomes. The primary outcomes were the same as in the original protocol, although the scoring method of one was changed and the analysis of assessing efficacy also changed from the original protocol. These changes were made as part of the detailed statistical analysis plan (itself published in full), which had been promised in the original protocol. These changes were drawn up before the analysis commenced and before examining any outcome data. In other words they were pre-specified, so it is hard to understand how the changes contributed to any potential bias. The relevant paper also alerted readers to all these changes and gave the reasons for them. Overall, we don’t think that the issues you raise with regard to the risk of selective outcome bias are such as to suspect high risk of bias, but recognize that you may reach different conclusions than us.

aaaarghI strongly take issue and see conflicts of interest rearing their ugly heads at a number of points.

  1. One can’t dismiss application of the Cochrane Risk of Bias tool as simply being subjective and then say whatever you want to say. The tool has well-specified criteria, and persons completing a review have to be trained to consensus. One of the key reasons that a single author can’t conduct a proper Cochrane collaboration review is that requires a trained team to agree on ratings of risk of bias. That’s one of the many checks and balances built into a systematic review.

Fortunately,  Cochrane  provides an important free chapter as a guide. Lots of people who conduct systematic reviews and meta-analyses who are not members of  Cochrane  nonetheless depend on the materials that the collaboration has developed because they are so clear, authoritative, and transparent in terms of the process by which they were developed.

  1. Largely as a result of our agitation,*applying the sixth of six risk of bias items (other bias) assesses whether the investigators a particular trial have a conflict of interest. The authors of the trial in question had a strong conflict of interest including paid and volunteer working for an insurance company and as assessors of disability eligibility. Ioannidis is would undoubtedly consider this as a high risk of bias.
  1. Larun dismisses the risk of bias associated with the investigators not sticking to the primary outcomes in their original protocol. She suggested deviations from these outcomes were specified before analyses commenced. However, this was an unblinded trial and the investigators could inspect incoming data. In fact, they actually sent out a newsletter to participants giving testimonials about the benefits of the trial while they were still recruiting patients. Think of it: if someone with ties to the pharmaceutical industry could peek at incoming data and make changes to designate outcomes, wouldn’t that be a high risk of bias? Of course.
  1. But it gets worse. Larun is a co-author with the investigators of the trial on another Cochrane protocol.

Larun L, Odgaard-Jensen J, Brurberg KG, Chalder T, Dybwad M, Moss-Morris RE, Sharpe M, Wallman K, Wearden A, White PD, Glasziou PP.  Exercise therapy for chronic fatigue syndrome (individual patient data) (Protocol).  Cochrane Database of Systematic Reviews 2014, Issue 4. Art. No.: CD011040.

  1. And one of the authors of the systematic review under discussion is a colleague in the department of the trial investigators.

How does Cochrane  define conflict of interest?

I’m a member of Cochrane and so I I am required to complete a yearly assessment of potential conflicts of interest. My report is kept on by the collaboration but not necessarily directly available to the public. You can download a PDF of the evaluation and an explanation here 

As you can see, Cochrane  staff and reviewers need to disclose (1) the financing of their review; (2) relevant financial activities outside the submitted work; (3) intellectual property such as patents, copyrights, and royalties; and (4) other relationships which has the instructions:

Use this section to report other relationships or activities that readers could perceive to have influenced, or that give the appearance of potentially influencing, what you wrote in the submitted work.

The conflicts of interest of Lillebeth Larun

A co-author of Lillebeth Larun on the systematic review under discussion is a colleague in the department of the investigators whose trial is being evaluated. Larun is a co-author on another protocol with these investigators. Examination of the acknowledgments that protocol indicates that the investigators provided both data and funding for meetings:

The author team held three meetings in 2011, 2012 and 2013 which were funded as follows:

  • 2011 via  Paul Glasziou, NIHR senior research fellow fund, Oxford Department of primary care.
  • 2012 via Hege R Eriksen, Uni Research Health, Bergen.
  • 2013 via Peter D White’s academic fund (Professor of Psychological Medicine, Centre for Psychiatry, Wolfson Institute of Preventive Medicine, Barts and The London School of Medicine and Dentistry, Queen Mary University of London).

So, the both the systematic review under discussion and the other protocol were conducted among “families and friends”. In dismissing concerns about risk of bias for a particular trial, Lillebeth Larun is ignoring the obvious strong bias for her associates.

She has no business conducting this review nor dismissing the high risk of bias of inclusion of their study.

So, what is going on here?

Peter White and the PACE investigator team are attempting to break down the checks and balances that a systematic review imposes on interpretation of results of clinical trials. That interpretation should be independent of the investigators who generated a trial and take into account their conflicts of interest. The PACE investigators had a conflict of interest when they generate the data and now they want to control the interpretation so that comes out in favor of their interest.

Some PACE investigators have ties to the insurance companies and they want the results to fit with the needs of these companies. Keep in mind that the insurance companies don’t necessarily care whether treatments work. Their intent is to require participation in treatment as a condition for receiving disability payments and to exclude disabled persons who want to treatment.

Cochrane collaboration takes conflict of interest seriously

A statement by the two editors heading the Cochrane Bone, Joint, and Muscle Trauma Group is quite quotable about the threats of involvement of investigators of the original trials to the integrity of systematic reviews.

Handoll H, Hanchard N. From observation to evidence of effectiveness: the haphazard route to finding out if a new intervention works. Cochrane Database of Systematic Reviews. 2014 Jan 1.

They state:

We feel should become a cardinal rule: the need to separate the clinical evaluation of innovations from their innovators, who irrespective of any of their endeavours to be ‘neutral’ have a substantial investment, whether emotional, perhaps financial, or in terms of professional or international status, in the successful implementation of their idea.

Disclosure of conflicts of interest may be insufficient to mitigate the effects:

The reporting of financial conflicts of interest in systematic reviews may not be sufficient to mitigate the effects of industry affiliations, and further measures may be necessary to ensure that industry collaborations do not compromise the scientific evidence.

Although these editors are concerned about pharmaceutical companies, their comments apply equally to other conflicts. In the case of the systematic review, the investigators of the original trial have financial conflicts and collaborations with the spokeswoman/first author of the systematic review under discussion. She has additional conflicts associated with their co-authoring and funding of another systematic review protocol.

I believe that if Cochrane  is intent on restoring its credibility, not only do they need to clean up this mess of layered conflicts of interest, they should investigate how it came about and how it can be avoided in the future.

I’ve already written to the collaboration and I await the response.



*Our article in The BMJ that won the Bill Silverman award specifically recommended:

…That the Cochrane Collaboration reconsider its position that trial funding and trial author-industry financial ties not be included in the risk of bias assessment. The 2008 version of the Cochrane handbook listed “inappropriate influence of funders” (section (for example, data owned by industry sponsor) as a potential source of bias that review authors could optionally incorporate in the “other sources of bias” domain of the Cochrane risk of bias tool.37 The 2011 version of the handbook, however, argues that “vested interests” should not be included in the risk of bias assessment, which “should be used to assess specific aspects of methodology that might be been influenced by vested interests and which may lead directly to a risk of bias” (section As previously noted,22 empirical criteria are generally used to select items (for example, sequence generation, blinding) that are included in assessments of risk of bias,38 48 including evidence of a mechanism, direction, and likely magnitude of bias. Empirical data show that trial funding by pharmaceutical companies and trial author-industry financial ties are associated with a bias towards positive results even when controlling for other study characteristics6 8 49 50 and, thus, meet these criteria. One concern might be that including conflicts of interest from included trials in the risk of bias assessment could result in “double counting” of potential sources of bias. However, ratings in the risk of bias table are not summed to a single score, and inclusion of risk of bias from conflicts of interest could reflect mechanisms through which industry involvement can influence study outcomes6 that are not fully captured by the current domains of the risk of bias tool (random sequence generation, allocation concealment, blinding of participants and staff, blinding of outcome assessment, incomplete outcome data, selective reporting, and other sources of bias). Furthermore, even if all relevant mechanisms were to be assessed, the degree of their influence may not be fully captured when reviewers only have access to the relatively brief descriptions of trial methods that are provided in most published reports. Inclusion of conflicts of interest from included trials in the risk of bias assessment would encourage a transparent assessment of whether industry funded trials and independently conducted trials reach similar conclusions. It would also make it explicit when an entire area of research has been funded by industry and would benefit from outside scrutiny.