Tuesday, January 17, 2012

Intellectual Dishonesty at NCCAM?


NCCAM is
the National Center for Complementary and Alternative Medicine (here) and it is the US Federal Government's lead agency for scientific research on complementary and alternative medicine (CAM). A recent investigation by the Chicago Tribune (here) concluded that "...precious research dollars could be better spent elsewhere." The types of studies funded by NCCAM have included inhalation of lavender and lemon to heal a wound (it didn't), coffee enemas for cancer (no effect), mind-body therapies (yoga, massage and acupuncture, the later found no better than sham acupuncture), energy healing, distant prayer, qigong (manipulation of a universal energy or life force), ginko biloba, saw palmetto (no benefit over placebo), etc.

In this post, I will look at one of the NCCAM studies (Sherman et. al 2011) funded to determine "whether yoga is more effective than conventional stretching exercises or a self-care book for primary care patients with chronic low back pain." In a press release, NCCAM concluded For Low-Back Pain, Yoga More Effective Than Self-Care But Not Stretching (the original study in the Archives of Internal Medicine is here). The study is of interest for a number of technical-statistical reasons which I will explore in my Random Variation blog (here). In this post, I'm just going to make a simple observation: the presented data do not seem to support the conclusions!

If you look carefully at the graphs from the article (presented above and discussed in a note below) you will notice that the error bars (confidence intervals) at the study's end point (week 26) overlap. Just eye-balling the graph, it would seem that self-care (the red line) will eventually be completely equal to either yoga or stretching which are clearly not different. What if they had run the study out to 32 weeks? To be fair to the authors, they based their conclusions on probability values (p-values), a commonly accepted approach to reporting significant results. Without getting into statistical details, p-values do not say anything about whether we are looking at an important difference in treatment and control effects. Differences of 1 point in the RDQ (Roland-Morris Disability Questionnaire) would seem to be of of questionable importance against 2 point confidence interval spreads (read more on expected values for the RDQ here).
From the standpoint of NCCAMs mission which is ambiguous (can they really conclude alternative medicine is a sham and expect continued funding?), the study certainly doesn't support the underlying theory (presented in the causal diagram above, click to enlarge). Yoga is supposed to be superior due to the "mental enhancement" of meditation. Since Yoga was no better than stretching, it seems for sure that the "mind-body" aspect of the trial was a failure.

However, the researchers go on to conclude that Yoga is safe and should be recommended by physicians for patients with low back pain. I don't follow these conclusions especially when there is anecdotal evidence of How Yoga Can Wreck Your Body. The researchers neither mention nor test these issues and I'm not sure why they would continue to recommend Yoga. And, in a related editorial in the same journal, Timothy S. Carey, MD, MPH, concludes that "The study by Sherman et. al. in this issue is an excellent example of a pragmatic comparative effectiveness trial (p. 2027)". The developing concensus about Comparative Effectiveness Research (CER) is important because it is a cornerstone of attempts to reform the US health care system. I will never have time to review all the CER studies, but I'm getting a little queasy feeling in my lower intestinal tract over CER and whether people will actually be treated based on CER studies let alone CAM.

NOTE: The four graphs above present two of the study outcome measures, the RDQ score and the Bothersome score (How bothersome was your back pain?). The left panel displays the raw scores and the right panel displays the adjusted scores to equalize initial conditions. Notice that no confidence intervals were provided for baseline. Statistically, I'm guessing that the initial conditions were not different. Any adjustment without expanding the end-point confidence intervals to reflect the uncertainty of the initial conditions seems questionable to me.

No comments:

Post a Comment