Dolly Parton and the MOE’s Best Evidence Research Synthesis

Given that so much of what “we do” in education in New Zealand is directed by meta-analysis, including all our Best Evidence Research Synthesis reports, reading Gene Glass on “Meta analysis at 25”  makes me feel insecure and vulnerable …

For …. much like reading Phil Rosenzweig’s The Halo Effect  undermined my faith in the Best Evidence Synthesis on School Leadership and Student Outcomes.   …. Glass’ meta-analysis critique promises to betray me …

It is important to note here that I don’t know enough statistics to refute Glass.  This lack of statistical referencing probably explains why, as I read Glass’ paper, I am reduced to imagining myself Dolly Parton-ised, belting out “Jolene, Jolene, Jolene, Jolene …I’m begging of you please don’t take my man  …” 

If I sing loudly enough perhaps I will block the following Jolene thoughts from undermining all the MoE “evidence based practice” “Best Evidence Synthesis” stuff in my mind

Jolene Thought #1: Does the process of selecting and discarding studies as described in the School Leadership and Student Learning Outcomes BES  equate to Glass’ censorship by some a priori set of prejudices?

…. newly minted confections such as "best evidence research synthesis," a come-lately contribution that added nothing whatsoever to what myself and many others had been saying repeatedly on the question of whether meta- analyses should use all studies or only "good" studies.

I remain staunchly committed to the idea that meta-analyses must deal with all studies, good bad and indifferent, and that their results are only properly understood in the context of each other, not after having been censored by some a priori set of prejudices. An effect size of 1.50 for 20 studies employing randomized groups has a whole different meaning when 50 studies using matching show an average effect of 1.40 than if 50 matched groups studies show an effect of -.50, for example. Glass 2000 Meta-Analysis at 25

Jolene Thought #2: Is it valid to make inferences from meta-analysis generated “effect sizes” when the nature of the studies used means that meta-analysis must lack both “a well defined population that has been randomly sampled” and “subjects that have been randomly assigned to conditions in a controlled experiment”?

Moreover, the typical meta-analysis virtually never meets the condition of probabilistic sampling of a population

It is common to acknowledge, in meta-analysis and elsewhere, that many data sets fail to meet probabilistic sampling conditions, and then to argue that one ought to treat the data in hand "as if" it were a random sample of some hypothetical population. One must be wary here of the slide from "hypothesis about a population" into "a hypothetical population." They are quite different things, the former being standard and unobjectionable, the latter being a figment with which we hardly know how to deal.

…… If the sample is fixed and the population is allowed to be hypothetical, then surely the data analyst will imagine a population that resembles the sample of data. If I show you a handful of red and green M&Ms, you will naturally assume that I have just drawn my hand out of a bowl of mostly red and green M&Ms, not red and green and brown and yellow ones. Hence, all of these "hypothetical populations" will be merely reflections of the samples in hand and there will be no need for inferential statistics. Or put another way, if the population of inference is not defined by considerations separate from the characterization of the sample, then the population is merely a large version of the sample. With what confidence is one able to generalize the character of this sample to a population that looks like a big version of the sample? Well, with a great deal of confidence, obviously. But then, the population is nothing but the sample writ large and we really know nothing more than what the sample tells us in spite of the fact that we have attached misleadingly precise probability numbers to the result. Glass 2000 Meta-Analysis at 25

Jolene Thought #3:  Are the statistical tests used to determine homogeneity in meta-analysis studies adequate for the task?

Once a hypothesis of homogeneity is accepted by Hedges's test, one is advised to treat all studies within the ensemble as the same. Experienced data analysts know, however, that there is typically a good deal of meaningful covariation between study characteristics and study findings even within ensembles where Hedges's test can not reject the homogeneity hypothesis. ….. The best data exploration and discovery are more complex and convincing than the most exact inferential test. Glass 2000 Meta-Analysis at 25

However, my biggest Jolene Thought Moment came on reading “Another reason why I'm leery of meta-analyses”  on the Respectful Insolence Blog (thanks to link from Stephen Downes)

Jolene Thought #4:  Are meta-analyses nothing more than systematic reviews of the literature with attitude?

The Respectful Insolence Blog post describes an investigation into the use of meta-analysis in evidence based medicine (EBM),  which does not sound all that dissimilar to the use of BES in New Zealand education.

One increasingly common method of trying to make sense of the morass of data addressing various clinical questions is the medical literature phenomenon known as meta-analysis. A meta-analysis is different from a clinical trial in that it is a statistical reanalysis of data from existing trials. Generally the highest quality trials are chosen in accord with the principles of EBM, and the data from these trials is all lumped together and analyzed in order to provide in essence a more rigorous treatment of existing data than a systematic review of the literature. To be subject to meta-analysis, a medical question must have multiple studies addressing it, and the results must be quantitative. Orac The Respectful Insolence Blog

The post goes on to describe an investigation into the meta-analysis of “randomized clinical trials (RCT) and review articles on the efficacy of intravenous magnesium in the early post-myocardial infarction period.”

The investigators found that 

“….given the same studies and the same extracted and abstracted data, different investigators came to very different conclusions. This is in contrast to the dogma that tells us that meta-analyses represent the most objective method of reviewing large bodies of studies. This study casts considerable doubt on this contention, as the authors point out:

Although systematic reviews with meta-analyses are considered more objective than other types of reviews, our results suggest that the interpretation of the data remains a highly subjective process even among reviewers with extensive experience conducting meta-analyses. The implications are important. The evidence-based movement has proposed that a systematic review with a meta-analysis of RCTs on a topic provides the strongest evidence of support and that widespread adoption of its results should lead to improved patient care. However, our results suggest that the interpretation of a meta-analysis (and therefore recommendations) are subjective and therefore depend on who conducts or interprets the meta-analysis.

The significance of this study is that it doesn't look at differences in the selection of studies for the meta-analysis or the interpretation of or extraction of data from the studies included in the meta-analysis. Every reviewer was given the same package, the same data, and the same statistical analyses of the included studies, thus eliminating this issue. Even given that, reviewers still interpreted the results of the meta-analyses very differently.

The results of this clever exercise provide just one more bit of evidence that leads me to believe that meta-analyses are nothing more than systematic reviews of the literature with attitude. That's not to say that meta-analyses of the literature aren't often useful, just as systematic reviews of the literature, are. They are in the same way that systematic reviews are: They boil down a large number of studies and suggest an interpretation. Let's just not pretend that meta-analyses are so much more objective than a systematic review as to be considered anything more. Orac The Respectful Insolence Blog

Gene Glass has a future focused suggestion in his paper – one that in 2008 we would most certainly claim as Web2.0 or even Web3.0 thinking ….. he suggests that researchers contribute to online data archives instead of publishing meta-analyses and or BES studies … it sounds worthy … it sounds like something that just might let me un Dolly myself and stop singing Jolene …

Meta-analysis needs to be replaced by archives of raw data that permit the construction of complex data landscapes that depict the relationships among independent, dependent and mediating variables.

Five years ago, this vision of how research should be reported and shared seemed hopelessly quixotic. Now it seems easily attainable. The difference is the I-word: the Internet. In 1993, spurred by the ludicrously high costs and glacial turn-around times of traditional scholarly journals, I created an internet-based peer-reviewed journal on education policy analysis (http://epaa.asu.edu). This journal, named Education Policy Analysis Archives, is now in its seventh year of publication, has published 150 articles, is accessed daily without cost by nearly 1,000 persons (the other three paper journals in this field have average total subscription bases of fewer than 1,000 persons), and has an average "lag" from submission to publication of about three weeks.

Two years ago, we adopted the policy that any one publishing a quantitative study in the journal would have to agree to archive all the raw data at the journal website so that the data could be downloaded by any reader. Our authors have done so with enthusiasm. I think that you can see how this capability puts an entirely new face on the problem of how we integrate research findings: no more inaccurate conversions of inferential test statistics into something worth knowing like an effect size or a correlation coefficient or an odds ratio; no more speculating about distribution shapes; no more frustration at not knowing what violence has been committed when linear coefficients mask curvilinear relationships. Now we simply download each others' data, and the synthesis prize goes to the person who best assembles the pieces of the jigsaw puzzle into a coherent picture of how the variables relate to each other. Glass 2000

Source: Artichoke

2 Responses to Dolly Parton and the MOE’s Best Evidence Research Synthesis