Most positive findings in psychology false or exaggerated? An activist’s perspective

Abstract of a  talk to be given at the Australian National University, room G08, Building  39, 3pm September  11, 2014.

confirmation biaIn 2005, John Ioannidis made the controversial assertion in a now famous PLoS Medicine paper that “Most Published Research Findings are False”. The paper demonstrated that many positive findings in biomedicine subsequently proved to be false, and that most discoveries are either not replicated or can be shown to be exaggerated. The relevance of these demonstrations was not appreciated until later in psychology.

Recent documented examples of outright fraud in the psychological literature have spurred skepticism. However, while outright fraud may be rare, confirmatory bias and flexible rules of design and analysis are rampant and even implicitly encouraged by journals seeking newsworthy articles. Efforts at reform have met with considerable resistance, as seen in the blowback against the replicability movement.

This talk will describe the work of one loosely affiliated group to advance reform by focusing attention not only on the quality of the existing literature, but on the social and political processes at the level of editing and reviewing. It will give specific examples of recent and ongoing efforts to dilute the absolute authority of editors and prepublication reviewers, and instead enforce transparency and greater reliance on post-publication peer review of claims and data.

Optional suggested readings (I suggest only one or two as background)

Is evidence-based medicine as bad as bad Pharma?

I am holding my revised manuscript hostage until the editor forwards my complaint to a rogue reviewer.

Reanalysis: No health benefits found for pursuing meaning in life versus pleasure.

A formal request for retraction of a cancer article

I reply to John Grohl’s “PLOS blogger calls out PLOS One –Huh?”

Apparently John Grohl was taken aback by my criticism of neurononsense in a PLOS One article.  I am pleased at gaining recognition at his highly accessed blog, but I think he was at least a bit confused about what was going on. The following comment is left at his blog post for approval.

plos oneJohn, thank you for encouraging people to read my blog post. If I ever get around to monetizing my activity by collecting my blog posts in a book, I will remind you that you said I know bad research when I see it and that even that I am brilliant. I will ask for a dust jacket endorsement or maybe a favorable review at Amazon.

Like you, I find it amazing that I was allowed free reign as a blogger to mount such an attack on an article that PLOS had published. I had filed a complaint with the journal over the undisclosed conflict of interest of one of the authors. I informed the managing editors that I would be merciless in my blog post in exposing what could found with meticulous reads and rereads of an article, once I was alerted by a conflict of interest. The journal is processing my formal complaint. Management indicated no interest in previewing my blog post, but only asked that I indicate it was my personal opinion, not that of the journal. I am confident that if this process were unfolding at a for-profit journal, or worse, one associated with a professional organization such as American Psychological Association or Association for Psychological Science, there would have been an effort to muzzle me, but little prospects for a full formal review of my complaint that respected the authors’ rights as well as my own.

sleuth_cartoon UsePLOS One requests disclosure of potential conflicts of interest in exquisite detail and accompanies every article with a declaration. It has an explicit procedure for reviewing complaints of breaches of its policies. I expressed my opinion in both in my blog post and in my complaint that the authors violated the trust of the journal and the readers by failure to make the disclosure of extensive financial benefits associated with an uncritical acceptance of the claims made in the article. But PLOS One does not go on the opinion of one person. Even when the complainant is one of its 4000 Academic Editors, it reviews the evidence and solicits additional information from authors. I am confident in the fairness of the outcome of that review. If it does not favor of my assessment, I will apologize to the authors, but still assert that I had strong basis for my complaints.

Like many people associated with PLOS, I have a great skepticism about the validity of prepublication review in certifying the reliability of what is said in published papers. My blog posts are an incessant effort to cultivate a skepticism in others and provide them with the tools to decide for themselves about whether to accept what they read. PLOS is different than most journals in providing readers with immediate opportunities to comment on articles in a way that will be available to anyone accessing them. I encourage readers to make more use ofPubMed Commons to those opportunities, as well as PubMed Commons for post publication peer review.

I am pleased and flattered that you think I laid out the problems in the article so bare that they are now obvious and should have precluded publication of the article. But it took a lot of work and lots of rereads and expertise that I alone do not possess. I got extensive feedback from a number of persons, including neuroscientists and I am quite indebted in particular to Neurocritic. I highly recommend his earlier blog post about this article. He had proceeded to the point of sniffing out that something was amiss. But when he called out Magnetto, the BS detector to investigate, he was thwarted by the lack of some technical details, as well has his inability to get into the down and dirty of the claims were being made about clinical psychology science. As we went back and forth building upon the revelations of the other, weMagneto_430 were both shocked and treated to aha experiences – why didn’t I notice that?

Initially, we trusted the authors citing a previous paper in Psychological Science for the validity of their methods and their choice of Regions of Interest (ROIs) of the brain for study. It took a number of reads of that paper to discover that they were not accurately reporting what was in that paper or the lack of correspondence to what was done in the PLOS paper. I consider the Psychological Science paper just as flawed and at risk for nonsense interpretations, but I have no confidence in APS or that journal’s tolerance for being called out on shortcomings in their peer review. I speak from experience.

Taken together, I consider my blog post and the flaws in the PLOS article that I targeted as indications of the need for readers to be skeptical about depending on prepublication peer review in evaluating the reliability of articles. And let us see the outcome of the formal review as to whether there is the self correction, if it is necessary, that I think we can depend on PLOS to provide.

Finally, let’s all insist on disclosure of conflicts of interest in every paper, not just those coming from the pharmaceutical industry. My documentation of the problems with promoters of Triple P Parenting have led to bar fights with some journals and even formal complaints to the Committee on Publication Ethics (COPE). Keep watching my blog posts to see the outcome of those fights. Disclosures of conflicts of interest depend on the candor of authors. I would have been a hypocrite if I did not call out the authors of a PLOS One article in the same way that I call out authors of publications in other journals.

igagged_jpg-scaled500PS. For the record, I quit blogging at Psychology Today because management changed one of my titles so as not to offend pharmaceutical companies.


Deconstructing misleading media coverage of neuroscience of couples therapy

Do we owe psychotherapists something more than noble lies and fairy tales in our translations of fMRI results?

hold me tightThe press release below was placed on the web by the University of Ottawa and refers to an article published in PLOS One. As I have noted in blog posts here and here, the PLOS One article is quite bad. But the press release is worse, and introduces a whole new level of distortion.

Comparing the article to the press release or my blog posts. You can get a sense of the nonsense in the press release. I will summarize the contradiction between these sources in my comments interspersed with excerpts from the press release.

True love creates resilience, turning off fear and pain in the brain

OTTAWA, May 1, 2014— New research led by Dr. Sue Johnson of the University of Ottawa’s School of Psychology confirms that those with a truly felt loving connection to their partner seem to be calmer, stronger and more resilient to stress and threat.

In the first part of the study, which was recently published in PLOS ONE, couples learned how to reach for their lover and ask for what they need in a “Hold Me Tight” conversation. They learned the secrets of emotional responsiveness and connection.

If you go to the PLOS one article, you will see no mention of any “Hold Me Tight” conversation, only that couples who were selected for mild to moderate marital dissatisfaction received couples therapy. The therapy was of longer duration than typically been provided in previous studies of Emotionally Focused Therapy (EFT). At completion, the average couple was still experiencing mild to moderate marital dissatisfaction and would still have qualified for entry into the study.

So, for a start, these were not couples feeling “loving connections to each other,” to the extent that the authors assume their quantitative measures are valid.

The second part of the study, summarized here, focused on how this also changed their brain. It compared the activation of the female partner’s brain when a signal was given that an electric shock was pending before and after the “Hold Me Tight” conversation.

The phrase “changed the brain” is vague and potentially misleading. It gives the false impression that there is some evidence that differences in fMRI results represent enduring, structural change, rather than transient, ambiguous changes in activity. Changes in brain activity does not equal change in structure of the brain. It seems analogous to suggesting that the air-conditioning coming on rearranged a room, beyond cooling it down temporarily. Or that viewing a TV soap opera changes the brain because it is detectable with fMRI.

Before the “Hold Me Tight” conversation, even when the female partner was holding her mate’s hand, her brain became very activated by the threat of the shock — especially in areas such as the inferior frontal gyrus, anterior insula, frontal operculum and orbitofrontal cortex, where fear is controlled. These are all areas that process alarm responses. Subjects also rated the shock as painful under all conditions.

Let us ignore that there is no indication in the PLOS One paper of a “Hold Me Tight” conversation. It is a gross exaggeration to say that the brain “became very activated.” We have to ask “Compared to what?” Activation of the brain is relative, and as the neuroscientist Neurocritic pointed out, there is no relevant comparison condition beyond partner versus stranger versus being alone. Nothing to compare them to and such fMRI data do not have anything equivalent to the standardization of an oven thermometer or a marital adjustment measure.. And the results are different than the press release would suggest. Namely,

In the vmPFC, left NAcc, left pallidum, rightinsula, right pallidum, and right planum polare, main effects ofEFT revealed general decreases from pre- to post- therapy in threat activation, regardless of whose hand was held, all Fs (1, 41.1 to 58.6) >3.9, all ps <.05. In the left caudate, left IFG, and vACC, interactions between EFT and DAS revealed that participants with the lowest pre-therapy DAS scores realized the greatest decreases from pre- to post-therapy in threat related activity, all Fs (1, 55.1 to 66.7) $6.2, all ps <. 02. In the right dlPFC and left supplementary motor cortex, interactions between handholding and EFT suggest that from pre- to post- therapy, threat-related activity decreased during partner but increased during stranger handholding, Fs (1, 44.6 to 48.9) = 5.0, ps = .03 (see Figure 5). [Emphasis added]

Keep in mind that these results are also derived from well over 100 statistical tests performed on data from 23 women and so they are likely due to chance. It is difficult to make sense of the contradictions in the results. By some measures, activation while holding both strangers and husbands’ hand decreased. Other differences were limited to the women with lower initial marital satisfaction.

It is also not clear what decreased activation means. It could mean that less thought processes are occurring or that thought processes take less effort. An fMRI is that ambiguous.

It is important to note what we are not told in the article. We are led by the authors to expect an overall (omnibus) test of whether changes in brain activity from before to after therapy will occur for when husbands’ hands are held, but not strangers or alone. It is curious that specific statistic is not reported where it should have been. It is likely that no overall simple difference was found, but the authors went fishing for whatever they could find anyway.

There is no mention in the paper of ratings of the shock as to degree of painfulness. There are ratings of discomfort.

We need to keep in mind that this experiment had to be approved by a committee for the protection of human subjects. If in fact the women were being subject to painful shock, the committee would not have granted approval.

The actual shock was 4 mA. I put a request out on Facebook for information as to how painful such a shock would be. A lab in Australia reported that in response, graduate assistants had been busy shocking themselves. W with that amperage could not produce a shock they would consider painful.

However, after the partners were guided through intense bonding conversations (a structured therapy titled Emotionally Focused Couple Therapy or EFT), the brain activation and reported level of pain changed —under one condition. While the shock was again described as painful in the alone and in the stranger hand holding conditions (albeit with some small change compared to before), the shock was described as merely uncomfortable when the husband offered his hand. Even more interesting, in the husband hand-holding condition, the subject’s brain remained calm with minimal activation in the face of threat.

Again, there are no ratings of painfulness described in the report of the experiment. The changes occurred in both husband and stranger handholding conditions.

The experiment explored three different conditions. In the first, the subject lay alone in a scanner knowing that when she saw a red X on a screen in front of her face there was a 20% chance she would receive a shock to her ankles. In the second, a male stranger held her hand throughout the same procedure. In the third, her partner held her hand. Subjects also pressed a screen after each shock to rate how painful they perceived it to be.

Here we are given a relevant detail. The women believed that they had a 20% chance of receiving a shock to their ankles. It is likely that the anticipation was uncomfortable, not the actual shock. The second condition is described as having their hand held by a male stranger. Depending on the circumstances, that could either be creepy or benign. Presumably, the “male stranger” was a laboratory assistant. That might explain why the actual results of the experiments suggest that after therapy, handholding by this stranger was not particularly activating of areas of the brain that it had been earlier.

But, the press release provides a distorted presentation of the actual results of the study. This presentation seems to indicate that the EFT it had occurred between the first and second fMRIs had produced an effect only for the condition in which the woman’s hand was held by a partner, not a stranger.

The actual results were weak and contradictory. They do not seem to be overall effects for free versus post therapy fMRI. Rather, effects were limited to a subgroup of women who had started therapy with exceptionally low marital satisfaction and persisted after they had therapy. The changes in brain activation associated with having their handheld by a partner were not different than a changes for having their hand held by a stranger.

These results support the effectiveness of EFT and its ability to shape secure bonding. The physiological effects are exactly what one would expect from more secure bonding. This study also adds to the evidence that attachment bonds and their soothing impact are a key part of adult romantic love.

How this could be accurate? The women did not have a secure bonding with the stranger, but their brain activation nonetheless changed. And apparently this did not happen for all women, mostly only those with lower marital satisfaction at the beginning of therapy.

From the press release, I cannot reconstruct what was done and what was found in the study reported in PLOS One. A lot of wow, a lot of shock and awe, but little truth.

Surely, you jest, Dr. Johnson.

Tools for Debunking Neuro-Nonsense About Psychotherapy

nonsenseThe second in my two-part blog post at PLOS Mind the Brain involves assisting readers to do some debunking of bad neuoscience for themselves. The particular specimen is neurononsense intended to promote emotionally focused psychotherapy (EFT) to the unwary. A promotional video and press releases drawing upon a PLOS One article were aimed to wow therapists seeking further training and CE credits. The dilemma is that most folks are not equipped with the basic neuroscience to detect neurobullocks. Near the end of the blog, I provide some basic principles for cultivating skepticism about bad neuroscience. Namely,

Beware of

  • Multiple statistical tests performed with large numbers of fMRI data points from small numbers of subjects. Results capitalize on chance and probably will not generalize.
  • The new phrenology, claims that complex mental functions are localized in single regions of the brain so that a difference for that mental function can be inferred from a specific finding for that region.
  • Glib interpretations that if a particular region of the brain is activated. It may simply mean that certain mental processes are occurring. Among other things, he could simply mean that these processes are now taking more effort.
  • Claims that changes in activation observed in fMRI data represent changes in the structure of the brain or mental processes. Function does not equal structure.

But mainly, I guided readers through the article calling attention to anomalies and just plain weirdness at the level of basic numbers and descriptions of procedures. Some of my points were rather straightforward, but some may need further explanation or documentation. That is why I have provided this auxiliary blog.

The numbers below correspond to footnotes embedded in the text of the Mind the Brain blog post.

1. Including or excluding one or two participants can change results.

Many of the analyses depended on correlation coefficients. For a sample of 23, a correlation of .41 is required for a significance of .05. To get a sense of how adding or leaving out a few subjects can affect results, look at the scatterplots below.

The first has 28 data points and a correlation of -.272.  The second plot has added in three data points which were not particularly outliers, and the correlation jumped to -.454.

slie r .27 Part 2 Groningen_Basic statistics plus

slide 1 Groningen_Basic statistics plus






2. There is some evidence this could have occurred after initial results were known.

The article notes:

 A total of 35 couples completed the 1.5 hour pre-therapy fMRI scan. Over the course of therapy, 5 couples either became pregnant, started taking medication, or revealed a history of trauma which made them no longer eligible for the study. Four couples dropped out of therapy and therefore did not complete the post EFT scan, two couples were dropped for missing data, and one other was dropped whose overall threat-related brain activation in a variety of regions was an extreme a statistical outlier (e.g., greater than three standard deviations below the average of the rest of the sample).

I am particularly interested in the women who revealed a history of trauma after the initial fMRI. When did they reveal it? Did disclosure occur in the course of therapy?

If the experiment had the rigor of a clinical trial as the authors claim, results for all couples would be retained, analogous to what is termed an “intention-to-treat analysis.”

There are clinical trials that started with more patients per cell and dropping or retaining just a few patients affected the overall significance of results. Notable examples are Fawzy et al. who turned a null trial into a positive one by dropping three patients and Classen et al in which results of a trial with 353 participants are significant or not, depending on whether one patient is excluded.

3. Any positive significant findings are likely to be false, and of necessity, significant findings will be large in magnitude, even when false positives.

A good discussion of the likelihood that significant findings from underpowered trials are likely to be false can be found here. Findings from small numbers of participants that are significant are larger, because larger effect sizes are required for significance.

4. They stated that they recruited couples with the criteria that their marital dissatisfaction initially be between 80-96 on the DAS. They then report that initial mean DAS score was 81.2 (SD=14.0). Impossible.

Couples with mild to moderate marital distress are quite common in the general population to which advertisements were directed. It statistically improbable that they recruited from such a pool and obtained a mean score of 81.2. Furthermore, with a lower bound of 80, it makes no sense that if the mean score was 81.2, there would be a standard deviation of 14. This is overall a very weird distribution if we accept what they say.

5. The amount of therapy that these wives received (M= 22-9, range =13-35) was substantially more what was provided in past EFT outcome studies. Whatever therapeutic gains were observed in the sample could not be expected to generalize to past studies.

Past outcome studies of EFT have provided 8 to 12 sessions of EFT with one small dissertation study providing 15 sessions.

6. The average couple finishing the study still qualified for entering it.

Mean DAS scores after EFT was declared completed were 96.0 (SD =17.2). In order to enroll in the study, couples had to have DAS scores 97 or less.

7. No theoretical or clinical rationale is given for not studying husbands or presenting their data as well.

Jim Coan’s video presentation suggests that he was inspired to do this line of research by observing how a man in individual psychotherapy for PTSD was soothed by his wife in the therapy sessions after the man requested that she be present. There is nothing in the promotional materials associated with either the original Coan study or the present one to indicate that fMRI would be limited to wives.

Again, if the studies really had the rigor of a clinical trial as claimed by the authors, the exclusive focus on wives versus husbands’ fMRI would have been pre-specified in the registration of the study. There are no registrations to the studies.

8. The size of many differences between results characterized as significant versus nonsignificant is not itself statistically significant.

With a sample size of 23, let’s take a correlation coefficient of .40, which just misses statistical significance. A correlation of .80 (p < .001) is required to be statistically more significant than .40 (p > .05). So, many “statistically significant findings” are not significantly larger than correlations that were ignored as not significant. This highlights the absurdity of simply tallying up differences that reach the threshold of significance, particularly when no confidence intervals are provided.

9. The graphic representations in Figures 2 and 4 were produced by throwing away two thirds of the available data.

standard deviations and normal distributionAs seen in the bell curve to the left, 68.2% (or ~16/23) of the women fall between the excluded -1. to + 1.0 SD.


Throwing away the data for 16 women leaves with 7. These were distributed across the four lines in Figures 2 and 4, one or two to a line. Funky?  yup.

j figure 2.pone.0079314.g002

What We Need to Do to Redeem Psychotherapy Research

This post serves as a supplement to one in PLOS Mind the Brain, Salvaging Psychotherapy Research: a Manifesto. The Mind the Brain post declares

We need to shift the culture of doing and reporting psychotherapy research. We need to shift from praising exaggerated claims about treatment and faux evidence generated to promote opportunities for therapists and their professional organizations.  Instead, it is much more praiseworthy to provide  robust, sustainable, even if more modest claims and to call out hype and hokum in ways that preserve the credibility of psychotherapy.

The current post provides documentation in the form of citations and further links for the points made there concerning the need to reform the psychotherapy research literature.

Many studies considered positive, including those that become highly cited, are basically null trials.

spin noTwo examples of null trials became highly cited because of spin.

Bach, P., & Hayes, S. C. (2002). The use of acceptance and commitment therapy to prevent the rehospitalization of psychotic patients: a randomized controlled trial. Journal of consulting and clinical psychology, 70(5), 1129.

Discussed in these blog posts:

More on the Acceptance and Commitment Therapy Intervention That Failed to Reduce Re-Hospitalization.

Study Did Not Show That Brief Therapy Kept Psychotic Patients Out of Hospital

Here is another trial spun and dressed up:

Dimidjian, S., Hollon, S. D., Dobson, K. S., Schmaling, K. B., Kohlenberg, R. J., Addis, M. E., … & Jacobson, N. S. (2006). Randomized trial of behavioral activation, cognitive therapy, and antidepressant medication in the acute treatment of adults with major depression. Journal of consulting and clinical psychology, 74(4), 658.

The Dimidjian et al trial launched interest in behavior activation as a Third Wavelipstick-pig psychotherapy and has been cited times, almost always uncritically. What could possibly be wrong with the study? I will have to blog about that sometime, but check it out, using the link to the PDF that I provided. Hint: whatever happened to the obviously missing presentation for main time x treatment interactions for the primary outcome? What was emphasized instead and why?

positive spin 2Spin starts in abstracts

Discussed in a pair of blog posts :

Investigating the Accuracy of Abstracts: An Introduction

Dissecting a Misleading Abstract

When controls are introduced for risk of bias or investigator allegiance, affects greatly diminish or even disappear.

An example is

Jauhar, S., McKenna, P. J., Radua, J., Fung, E., Salvador, R., & Laws, K. R. (2014). Cognitive-behavioural therapy for the symptoms of schizophrenia: systematic review and meta-analysis with examination of potential bias. The British Journal of Psychiatry, 204(1), 20-29.

For an interesting discussion of how much meta-analyses of the same literature can vary in conclusions whether or not risk of bias and investigator allegiance are taken into account, see

Meta-Matic: Meta-Analyses of CBT for Psychosis

Conflicts of interest associated with authors having substantial financial benefits at stake are rarely disclosed in the studies that are reviewed or the meta-analyses themselves.

I recently blogged about these two articles

Sanders, M. R., Kirby, J. N., Tellegen, C. L., & Day, J. J. (2014). The Triple P-Positive Parenting Program: A systematic review and meta-analysis of a multi-level system of parenting support. Clinical psychology review, 34(4), 337-357.


Sanders, M. R., & Kirby, J. N. (2014). Surviving or Thriving: Quality Assurance Mechanisms to Promote Innovation in the Development of Evidence-Based Parenting Interventions. Prevention Science, 1-11.


Critical analysis of a meta-analysis of a treatment by authors with financial interests at stake


Are meta-analyses done by promoters of psychological treatments as tainted as those done by Pharma?

And here.

Sweetheart relationship between Triple P Parenting and the journal Prevention Science?

There are low thresholds for professional groups such as the American Psychological Association Division 12 and governmental organizations such as the US Substance Abuse and Mental Health Services Administration (SAMHSA) declaring treatments to be “evidence-supported.”

I blogged about this.

Troubles in the Branding of Psychotherapies as “Evidence Supported”

Professional groups have conflicts of interest in wanting their members to be able to claim the treatments they practice are evidence-supported.

I have blogged about the Society for Behavioral Medicine a number of times

Faux Evidence-Based Behavioral Medicine at Its Worst (Part I)

Faux Evidence-Based Behavioral Medicine Part 2

Does psychotherapy work for depressive symptoms in cancer patients?

Some studies find differences between two active, credible treatments

This does not happen very often, but here is such a study:

Poulsen, S., Lunn, S., Daniel, S. I., Folke, S., Mathiesen, B. B., Katznelson, H., & Fairburn, C. G. (2014). A randomized controlled trial of psychoanalytic psychotherapy or cognitive-behavioral therapy for bulimia nervosa. American Journal of Psychiatry, 171(1), 109-116.

that I blogged about

When Less is More: Cognitive Behavior Therapy vs Psychoanalysis for Bulimia

Bogus and unproven treatments are promoted with pseudoscientific claims.

Here is a website offering APA approved continuing education credit for Somatic Experiencing

Somatic Experiencing® is a short-term naturalistic approach to the resolution and healing of trauma developed by Dr. Peter Levine. It is based upon the observation that wild prey animals, though threatened routinely, are rarely traumatized. Animals in the wild utilize innate mechanisms to regulate and discharge the high levels of energy arousal associated with defensive survival behaviors. These mechanisms provide animals with a built-in ‘’immunity’’ to trauma that enables them to return to normal in the aftermath of highly ‘’charged’’ life-threatening experiences.

Declarations of conflicts of interest are rare and exposure of authors who routinely failed to disclose conflicts of interest is even rarer.

I blogged about this:

Sweetheart relationship between Triple P Parenting and the journal Prevention Science?

Departures from preregistered protocols in published reports of RCTs are common, and there is little checking of discrepancies in abstracts from results that were actually obtained or promised in preregistration.

Here is a notable recent example about which I blogged here, here and here.

Morrison, A. P., Turkington, D., Pyle, M., Spencer, H., Brabban, A., Dunn, G., … & Hutton, P. (2014). Cognitive therapy for people with schizophrenia spectrum disorders not taking antipsychotic drugs: a single-blind randomised controlled trial. The Lancet, 383(9926), 1395-1403.

See also

Milette, K., Roseman, M., & Thombs, B. D. (2011). Transparency of outcome reporting and trial registration of randomized controlled trials in top psychosomatic and behavioral health journals: a systematic review. Journal of psychosomatic research, 70(3), 205-217.

Specific journals are reluctant to publish criticism of their publishing practices.

Cook JM, Palmer S, Hoffman K, Coyne JC. Evaluation of clinical trials appearing in Journal of Consulting and Clinical Psychology: CONSORT and beyond. The Scientific Review of Mental Health Practice. 2007;5,69-80.

This article attempted to point out shortcomings in the reporting of clinical trials In Journal of Consulting and Clinical Psychology and was first submitted there and rejected. As you can see, we in no way intended to bash the journal, but to highlight the need for adopting and enforcing CONSORT. The article was sent out to review to two former editors who understandably took issue with its depiction of the quality of clinical trials they had accepted for publication. Fortunately, we were able to publish this article elsewhere when JCCP rejected it.

Those of us around on the listservs in the early 2000s can recall how aggressively APA resisted adoption of CONSORT. Finally APA relented with

Guidelines seek to prevent bias in reporting of randomized trials

But it contained an escape clause. All authors had to do was fail to declare their study was a RCT. But making that disclosure is part of adhering to CONSORT!.

Authors of APA journal articles who call a clinical trial a “randomized controlled trial” (RCT) are now required to meet the basic standards and principles outlined in the Consolidated Standards of Reporting Trials (CONSORT) guidelines as part of an effort to improve clarity, accuracy and fairness in the reporting of research methodology.

We complained and the escape clause was eliminated, even if enforcement of CONSORT remained spotty.

Coyne, J. C., Cook, J. M., Palmer, S. C., & Rusiewicz, A. (2004). Clarification of clinical trial standards. Monitor on Psychology: A Publication of the American Psychological Association, 35(11), 4-8.

If a title or abstract of a paper reporting a RCT does not explicitly state “randomized clinical trial,” there is risk it will be lost in any initial search of the literature. We propose that if editors and reviewers recognize that a study reports a randomized clinical trial, they will require that authors label it as such and that they respond to the CONSORT checklist.


No more should underpowered in exploratory pilot feasibility studies be passed off as RCTs when they achieve positive results.

An excellent discussion of this issue can be found in

Kraemer, H. C., Mintz, J., Noda, A., Tinklenberg, J., & Yesavage, J. A. (2006). Caution regarding the use of pilot studies to guide power calculations for study proposals. Archives of General Psychiatry, 63(5), 484-489.


Leon, A. C., Davis, L. L., & Kraemer, H. C. (2011). The role and interpretation of pilot studies in clinical research. Journal of psychiatric research, 45(5), 626-629.

Evaluations of treatment effects should take into account prior probabilities suggested by the larger literature concerning comparisons between two active, credible treatments. The well-studied treatment of depression literature suggests some parameters.

Cuijpers, P., & van Straten, A. (2011). New psychotherapies for mood and anxiety disorders: necessary innovation or waste of resources?. Canadian journal of psychiatry. Revue canadienne de psychiatrie, 56(4), 251.

On the one hand, there is a clear need for better treatments, as mood and anxiety disorders constitute a considerable burden for patients and society. Further, modelling studies have shown that current treatments can reduce only one-third of the disease burden of depression and less than one-half of anxiety disorders, even in optimal conditions.2

However, there are already dozens of different types of psychotherapy for mood and anxiety disorders, and there is very little evidence that the effects of treatments differ significantly from each other. In depression, we found that interpersonal psychotherapy is somewhat more effective than other therapies,3 but differences were very small (Cohen’s d < 0.21) and the clinical relevance is not clear. In the field of anxiety disorders, there is evidence that relaxation is less effective than cognitive-behavioural therapy, but there is very little evidence for significant differences between other therapies.

We think that new therapies are only needed if the additional effect compared with existing therapies is at least d = 0.20. Larger effect sizes are not reasonable to expect as 0.20 is the largest difference between therapies found until now. Further, this effect needs to be empirically demonstrated in high-quality trials.

However, to show such an effect of 0.20 we would need huge numbers. A simple power calculation shows that this would require a trial of about 1000 participants (STATA[Statacorp, College Station, TX] sampsi command). As a comparison, the large National Institute of Mental Health Treatment of Depression Collaborative Trial examining the effects of treatments of depression included only 250 patients.


Adverse events and harms should routinely be reported.

Vaughan, B., Goldstein, M. H., Alikakos, M., Cohen, L. J., & Serby, M. J. (2014). Frequency of reporting of adverse events in randomized controlled trials of psychotherapy vs. psychopharmacotherapy. Comprehensive Psychiatry.

Meta-analyses of psychotherapy should incorporate p-hacking techniques.

This is discussed in

Lakens, D., & Evers, E. R. (2014). Sailing From the Seas of Chaos Into the Corridor of Stability Practical Recommendations to Increase the Informational Value of Studies. Perspectives on Psychological Science, 9(3), 278-292.

evidence based

Sweetheart relationship between Triple P Parenting and the journal Prevention Science?

Update June 1, 2014, 8:50 pm US EDT: It has just been called to my attention that until December of last year, Robert McMahon was Editor of Prevention Science. He is a member of the Triple P Parenting International Scientific Advisory Committee . This makes the possible relationship between Triple P and the journal, including undisclosed conflicts of interest being let slip by all the more troubling.

Not one, but two problematic papers…..

shark-life-guardImagine the CEO of a pharmaceutical company writing a spirited defense of the need for drug companies to be involved in evaluations of their products. Drug company involvement is trumpeted as a means of ensuring the quality of the evaluation and the dependability of the results. Independent evaluation is seen as carrying the risk of skepticism or motivation to discredit a product, tainting the results.

Imagine that this occurred in an article that only in passing mentioned the extensively documented effects of conflicts of interest on published reports of the efficacy of drugs.

Finally, this article was published in the journal with an explicit requirement for disclosure of conflicts of interest, but with no disclosure.

But, no, I am not talking about an article by the CEO of a pharmaceutical company. The article was written by promoters of Triple P Parenting (3P), who consistently publish evaluations of their own treatment without conflicts of interest statements.

Sanders, M. R., & Kirby, J. N. (2014). Surviving or Thriving: Quality Assurance Mechanisms to Promote Innovation in the Development of Evidence-Based Parenting Interventions. Prevention Science, 1-11.

Going to the website of their university, one can find an intellectual property statement that provides for lavish rewards for those who develop products such as psychological treatments.

 Any Net Proceeds of Commercialisation received by the University pursuant to this Policy will be disbursed as follows:

(a) 1/3 to the Contributors determined in accordance with the relevant Procedures;

(b) 1/3 to the University, which will be retained by a University Commercialisation Company where the Commercialisation of the IP occurs through that University Commercialisation Company; and

(c) 1/3 to the relevant faculty or institute of the University.

3P promoters appear to be in a position to make lots of money from their treatment.

Prevention Science has a person on its editorial board actively involved in promoting 3P  Ronald J. Prinz. It is also the journal of the Society for Prevention Science. At the society’s website, you can see

The Society has an interest that its members, in their research, teaching, practice and service, shall be alert and attentive to situations that might cause a conflict of interest (or the appearance of a conflict), and to take appropriate action either to prevent such a conflict or to disclose it to all appropriate parties.

Such conflicts of interest may arise out of commitments involving honoraria, consultant relationships, participation in a speakers bureau, stock holdings or options, royalties, ownership of a company or patent, research contracts or grants, and, in some instances, being an official representative of another organization.

Key Guiding Principles

……SPR members should disclose relevant sources of financial support, and pertinent personal or professional interests, to their employers or clients, to the sponsors of their professional work, and in public speeches and written publications.

Conflicts of interest, even substantial financial ones, do not necessarily discredit an evaluation. Undisclosed conflicts of interest do so because they prevent readers from independently evaluating whether they should accept the claims being made.

You might be motivated to write a letter to the editor of Prevention Science complainingpottery barn2 about this lapse in its disclosure policy. But don’t waste your time. Prevention Science summarily dismisses letters to the editor as a matter of policy. The journal does not subscribe to the so-called Pottery Barn Rule: journals should have a means of readers calling attention to the inevitable lapses in peer review and unreliability in published claims.

What might the promoters of 3P have been up to in writing this article?

The evidence-supported status of 3P has been questioned, starting with a meta-analysis and my commentary that demonstrated that the bulk of randomized trials evaluating 3P consisted of underpowered, methodologically flawed studies conducted by persons with conflicts of interest.

After having failed to suppress publication of this meta-analysis, promoters 3P mounted a response, including circulating on the web a meta-analysis of their own that they apparently had to withdraw, before eventually publishing it in Clinical Psychology Review.

To focus on the virtues of developer involvement shifts discussions away from broader issues of the well-established taint of financial conflicts of interest. Just as financial conflicts of interest are not limited to the developers of drugs, conflicts of interest can be manifested by those who promote the treatments and benefit from the treatment being declared evidence-supported.

With conflict of interest statements so infrequent in the 3P literature, is already quite difficult, if not impossible to establish financial conflicts of interest in those publishing evaluations of the treatment, other than from those obviously associated with its development.

This article muddies the water and claims that there is a demonstration that conflict of interest, redefined as developer involvement, does not matter.

A recent systematic review and meta-analysis of Triple P (Kirby and Sanders 2014) included 101 studies, 68 of which were randomized controlled trials of specific programs within the Triple P system (e.g., the Level 4 Group Triple P program), of which 66 were peer-reviewed publications. A range of moderators were examined including developer involvement. Developer involvement was classified into two categories: (a) any developer involvement or (b) no involvement. Seventy papers were categorized as having some level of developer involvement, whereas 31 papers had no developer involvement. Using structural equation modeling, the meta-analysis revealed that both developer led studies and studies with no developer involvement produced significant small to medium effect sizes for a range of child and parent outcomes (Kirby and Sanders 2014). This is the first time a meta-analysis has examined the level of developer involvement as a moderator variable for potential intervention effects. Importantly, both developer led and independent evaluations showed similar positive effects on a range of child and parent outcome measures.

If one goes to the reference list of the Prevention Science article, the Kirby and Sanders (2014) article that is cited does not contain these data. So, readers cannot evaluate these extraordinary claims and reviewers apparently did not check either.

Why did  the reviewers at Prevention Science not check out a claim that is the crux of the credibility of article they accepted? We can only speculate. The article circulated on the web by 3P promoters and then withdrawn included 101 studies. Maybe the reference was to that. And the subsequent Clinical Psychology Review meta-analysis includes 101 studies. But it has different authors. And, as I have been documenting at PLOS Mind the Brain, it is a horribly conducted and reported meta-analysis, aside from its lack of statement of conflicts of interest.

One slip up in the review and publication of a questionable manuscript with undisclosed sweetheart dealconflicts of interest do not necessarily indicate a sweetheart deal between the promoters of a treatment and a journal.

But then there is a more egregious article in the same journal:

Prinz, R. J., Sanders, M. R., Shapiro, C. J., Whitaker, D. J., & Lutzker, J. R. (2009). Population-based prevention of child maltreatment: The US Triple P system population trial. Prevention Science, 10(1), 1-12.

The article describes outcomes of the Triple P System Population Trial (TPSPT) in South Carolina, important as one of only five tests of 3P on a population basis and the only one cnducted in the United States. It carries considerable weight, has gotten considerable publicity, and has proven decisive in designation of 3P as evidence-supported.

The study was not pre-registered at a publicly accessible site. However, the details of the study were published in Clinical Psychology Review ahead of the Prevention Science article, but years after data collection had begun. The Prevention Science article cites this.

Prinz, R. J., & Sanders, M. R. (2007). Adopting a population-level approach to parenting and family support interventions. Clinical Psychology Review, 27(6), 739-749.


Professor Manuel Eisner

A working paper by Manuel Eisner, Professor of Comparative & Developmental Criminology, Deputy Director of the Institute of Criminology, Cambridge University points to some serious discrepancies between what was reported in the Prevention Science article and what appeared in the Clinical Psychology Review. Discrepancies include contradictory description of outcome data collection and analysis and designation of the primary outcome. There is the strong suggestion of selective reporting.

Eisner, M. (2014). The South Carolina Triple P System Population Trial to Prevent Child Maltreatment: Seven Reasons to be Sceptical about the Study Results. Violence Research Centre Working Paper. University of Cambridge Institute of Criminology.

A table from the working paper detailing discrepancies is reproduced below.

Table 1       TPSPT study design as reported in Prinz and Sanders (2007) and findings reported in Prinz et al (2009).

Criterion Reported in
Prinz and Sanders (2007:746-748)
Reported in
Prinz et al. (2009)
Randomization Pair-wise matching Stratified random assignment
Age range 0-7 0-8
Population Survey Annual assessments A pre and a post intervention survey
Baseline reference period for official data 5 years before the intervention Last year before the intervention
Outcomes – Survey based Parent knowledge of Triple PParent involvement in Triple PRelationship between demographics and participationParenting practicesParent confidence and stress
Child maladjustment
Reported[Not reported][Not reported][Not reported][Not reported]
[Not reported]
Outcomes – Archival Records Reported CMSubstantiated physical maltreatmentSubstantiated neglectSubstantiated sexual abuseSubstantiated overall CMOut-of-home placements[not planned] (supplementary analysis)[not reported][not reported][not reported]ReportedReportedChild CM injuries

Eisner raises questions about the basic credibility of the results reported in Prevention Science:

 TPSPT claims large benefits from treatment exposure by about 10% of a universal target population. Yet TPSPT is a short intervention that mainly entails tip sheets, seminars to large audiences, and brief consulting sessions. So the claim is that this modestly intensive intervention for any one family is sufficient to reduce child maltreatment by between 28% (substantiated child abuse) and 44% (out of home placements) at the population level when about 17% of families receive this modestly intensive intervention. While logically not impossible, the claim lacks face validity.

SPR has an explicit relevant policy

 SPR members should not misrepresent their procedures, data or findings. They should report their findings fully and should not omit relevant data. They should report study results regardless of whether these results support or contradict expected outcomes. Within reasonable limits, they should attempt to prevent or correct any substantial misuses of their work by others.

Two questionable articles slipping through the peer review process without needed disclosures of conflict of interest, and a promoter of P3 on the review board, do not establish a sweetheart relationship. But it is sufficient evidence for Prevention Science to adopt the Pottery Barn Rule and allow letters to the editor identifying lapses in peer-reviewed. And it illustrates problems in accepting promoters’ evaluations of their own treatments.


Ethical problems of Clinical Psychology Review with Triple P Parenting

Unreliable3LogoUpdate May 17, 2014.  One of my friends, Professor Jon Elhai is listed on the editorial board of Clinical Psychology Review. I wrote to him to ask his opinion of my blog post. He said that I had to be mistaken, he had never agreed to join the editorial board. He is nonetheless listed there without his permission.

I have now included a link to the formal complaint to Committee on Publication Ethics (COPE) filed by Philip Wilson, M.D. PhD, the author who were subject to the mistreatment when we submitted a manuscript to this journal. And here is a link to the committee’s correspondence with the Editor of Clinical Psychology Review.

The editors of Clinical Psychology Review have ethical problems.

They mishandled two manuscripts concerning meta-analyses of Triple P Parenting programs. Until these problems are resolved

  • Anyone contemplating submitting a manuscript to that journal should have second thoughts.
  • Readers should be more skeptical about what they read in this journal and what they do not get to read because of pressures on authors to express a particular point of view.
Photo Credit: Bill Burris

Photo Credit: Bill Burris

Skepticism is always a good stance to take with meta analyses. If you cannot read meta

analyses skeptically, you should not reading them. But recent events make it particularly important with Clinical Psychology Review.

A breach of confidentiality occurred in the review of a manuscript from Philip Wilson, MD that was critical of Triple P Parenting. He was subject to pressures in his workplace triggered by disclosures from someone associated with the review process. Wilson had not told administrators in his workplace about his manuscript. Without explanation, persons associates with Triple P also sent Wilson papers published after his meta analysis had been completed.

I am unimpressed by the lack of diligence that the editors have shown in dealing with Wilson’s complaints to the journal. He alerted him to the problems and asked them if anyone in the review process had a conflict of interest. They would not reply. He has now appropriately moved his complaint to the Committee on Publication Ethics (COPE).

Promoters of Triple P Parenting subsequently published a meta-analysis from promoters of Triple P in Clinical Psychology Review without a disclosure of conflict of interest. Meta analyses done by persons with financial stakes in the evaluation of a treatment are always highly suspect. And this meta-analysis was obviously written in response to the bad publicity generated by publication elsewhere of Wilson’s manuscript that had previously been savaged at Clinical Psychology Review. The article cited Wilson but misrepresented his criticisms.

One of the reasons cited by the editor of Clinical Psychology Review for rejecting Wilson’s manuscript was that there are already too many reviews of Triple P Parenting. Apparently that does not apply to another meta analysis from promoters of Triple P. Note also that one of the “too many” other meta analyses was by authors with financial interests in Triple P.

Triple P Parenting has been highly lucrative for its promoters. Their financial interests have been put at risk by the publication of Wilson’s paper and the attention it is generating. Wilson’s paper documented the weakness of the evidence for the effectiveness of Triple P and the thorough tainting of what evidence there is by involvement by promoters of Triple P.

Already European Early Childhood Education Research Journal has echoed the charges made in Wilson’s paper.  An editorial  announced it was tightening its requirements for disclosure of conflicts of interest.

Other authors have now gone public with complaints of unfair reviews of their papers with negative findings concerning Triple P, the difficulties they faced getting their papers published and of pressures from its promoters. The status Of Triple P as empirically supported has been seriously challenged. And it is getting easier to publish honestly reported negative findings from clinical trials. I saw a similar phenomenon when the validity of Type D Personality was put into question and all of the sudden more negative trials without spin began to be published

 Time for my disclosure.

I have not met Phil Wilson, but I was persuaded by his article in BMC Medicine to take a closer look at the Triple P literature. I concluded that he was actually being too easy on the quality of the evidence for the intervention. He noted serious methodological problems, but missed just how much claims of efficacy depended on poor quality studies that were so small and underpowered that their rate of positive findings was statistically improbable.

I blogged about this, stirring some further controversy that ultimately led to an invitation to expand the blog post into an article at BMC Medicine.

Wilson’s meta-analysis

The meta-analysis is well reasoned and carefully conducted, but scathing in its conclusion:

In volunteer populations over the short term, mothers generally report that Triple P group interventions are better than no intervention, but there is concern about these results given the high risk of bias, poor reporting and potential conflicts of interest. We found no convincing evidence that Triple P interventions work across the whole population or that any benefits are long-term. Given the substantial cost implications, commissioners should apply to parenting programs the standards used in assessing pharmaceutical interventions.

My re-evaluation

My title says it all: Triple P-Positive Parenting programs: the folly of basing social policy on underpowered flawed studies.

My abstract

Wilson et al. provided a valuable systematic and meta-analytic review of the Triple P-Positive Parenting program in which they identified substantial problems in the quality of available evidence. Their review largely escaped unscathed after Sanders et al.’s critical commentary. However, both of these sources overlook the most serious problem with the Triple P literature, namely, the over-reliance on positive but substantially underpowered trials. Such trials are particularly susceptible to risks of bias and investigator manipulation of apparent results. We offer a justification for the criterion of no fewer than 35 participants in either the intervention or control group. Applying this criterion, 19 of the 23 trials identified by Wilson et al. were eliminated. A number of these trials were so small that it would be statistically improbable that they would detect an effect even if it were present. We argued that clinicians and policymakers implementing Triple P programs incorporate evaluations to ensure that goals are being met and resources are not being squandered.

You can read the open access article, but here is the crux of my critique

Many of the trials evaluating Triple P were quite small, with eight trials having less than 20 participants (9 to 18) in the smallest group. This is grossly inadequate to achieve the benefits of randomization and such trials are extremely vulnerable to reclassification or loss to follow-up or missing data from one or two participants. Moreover, we are given no indication how the investigators settled on an intervention or control group this small. Certainly it could not have been decided on the basis of an a priori power analysis, raising concerns of data snooping [14] having occurred. The consistently positive findings reported in the abstracts of such small studies raise further suspicions that investigators have manipulated results by hypothesizing after the results are known (harking) [15], cherry-picking and other inappropriate strategies for handling and reporting data [16]. Such small trials are statistically quite unlikely to detect even a moderate-sized effect, and that so many nonetheless get significant findings attests to a publication bias or obligatory replication [17] being enforced at some points in the publication process.

A dodgy history and questions about the stringency of review at Clinical Psychology Review

The meta analysis published in Clinical Psychology Review was first distributed on the web with a statement that it was under review at Monographs of the Society for Research in Child Development.

Most journals have strict policy forbidding circulation of papers labeled as being under review by them. American Psychological Association and many journals explicitly warn authors not to do this. This is because use of the journal’s name may lends credibility to which might ultimately get rejected. Or it will generate confusion if the paper is later accepted but in a highly revised form. Readers of the unpublished paper might not check to see if it is claims held up through peer review.

Apparently, the manuscript was rejected at Monographs of the Society for Research in Child Development.

You can take a look at the manuscript here. It was much too long for Clinical Psychology Review and so some cuts had to be made before submitting it there. But this earlier manuscript allows a comparison with what actually got published at Clinical Psychology Review and raises serious questions about the stringency of the review.

Prior to submitting to this first journal, the meta-analysis was publicly pre-registered with International Prospective Register of Systematic Reviews (PROSPERO) with the number CRD42012003402.

Publicly accessible preregistration of meta-analyses facilitates transparency and reduces the likelihood that authors will revise their hypotheses after peeking at the results. PLOS One routinely reminds Academic Editors and reviewers to consult the preregistration of meta-analyses. Apparently that was not done at Clinical Psychology Review, but if anyone had bothered to look, they would have found some interesting things.

  • The preregistration clearly reports important conflicts of interest on the part of the authors that were not disclosed in the Clinical Psychology Review article.
  • What was promised in the preregistration concerning comparisons of Triple P Parenting with active treatments did not occur in the Clinical Psychology Review. The result was inflation of the effect sizes for Triple P Parenting.

I find it extraordinary that Clinical Psychology Review did not require disclosure of conflict of interests in the publication of the promoters’ meta-analysis, particularly after they had been sensitized by the issue rejected paper.

We should all be uncomfortable with the appearance of a lack of integrity to the review processes at Clinical Psychology Review. I would be reluctant to submit a manuscript there without these issues being resolved. As seen in my numerous blog posts, I often raise issues about the quality of evidence that is mustered in support of particular therapies being evidence-based. I sometimes get sandbagged in blinded peer review. This often cannot be anticipated, but when instances are called to the attention of journal editors, they should do something other than stonewall.

I would not be want to be a situation similar to what Wilson has faced: getting my manuscript savaged by reviewers with a conflict of interest, then attempting to pressure institutions to silence me. And then having Clinical Psychology Review publish an article meant to counter my paper if I succeeded in getting it published elsewhere, but without disclosure of the authors’ conflicts of interest.

Given the journal has not followed steps laid out in Elsevier’s own Publishing Ethics Resource Kit (PERK) flowcharts, I think the task falls to the publisher to demonstrate the situation has been resolved and any potential threat to other authors by the circumstances of the journal have been removed.

I will be blogging at length at PLOS Mind the Brain about Sanders’ meta-analysis in Clinical Psychology Review. I think it is a case study in utter disregard for the standards of conducting and reporting a meta-analysis— and yet getting published in a reputable journal. What ever peer review this article received, it was inadequate.

Among the problems that I will document:

cherrypickingThe meta-analysis

  • Substantially distorted presentation of results in the abstract.
  • Excluded comparisons between Triple P Parenting and active intervention after promising them I the protocol. This exclusion inflated the effect size of triple P that was reported
  • Combined nonrandomized studies with RCTs in ways that inflate estimates of effect size triple P.
  • Cherrypicked and misrepresented existing literature concerning triple P parenting
  • Cherrypicked and misrepresented the existing methodological literature to create appearances for decisions made in undertaking and interpreting a meta-analysis; choices made in Triple P meta analysis actually contradict recommendations in the literature cited in support of these choices.
  • Promised analyses of heterogeneity would be conducted, but these were never reported or interpreted. These could potentially have revealed that combining results are very different studies was not appropriate.
  • Failed to offer interpretation in the text for substantial effects due to investigator involvement in intervention trials (i.e., conflict of interest) and study size.
  • Had glowing summaries in discussion of results that contradicted what was actually found.