← Back to Search
Published 1980 ·
Dear Editor: The article by S. Alan Cohen and Joan S. Hyman (ER, December 1979) serves as a timely reminder of the arguments presented so forcefully by Jacob Cohen (1977 and elsewhere) for including a statistical power analysis as the basis for sample size selection in applications of traditional hypothesis-testing procedures. However, their article contains a number of technical errors of which readers of this journal should be aware. I would like to concentrate on two of the more serious errors. The Cohen and Hyman thesis goes something like this: (1) Many studies have low statistical power; (2) Lots of null hypotheses still get rejected; (3) Therefore, something is rotten. The culprit (they argue) is over-reliance on the level of statistical significance, and the solution (they further argue) is to focus on effect size. There are several problems with this argument. First of all, one has to be very careful with their claim that "over 70 percent of. .. studies . . . lack statistical power" (p. 12). With respect to what alternative hypotheses? A simplerandomized-design with three subjects per treatment condition can have very high power for testing an effect size of five or more against no effect. And how low is low? Choice of power level is a very personal decision; some people can stand very high risks of Type II errors, others cannot. But even granting that many studies seem "predestined" to retain the null, when the results are in, the fault cannot lie with alpha-dependence. Virtually everyone uses .05 or .01 and if there appears to be an over-abundance of rejections of null hypotheses using a deck stacked in favor of their retentions, the explanation has to be chance; that's the way the ball has bounced. The recommended solution of deemphasizing significance level and reemphasizing effect size brings me to the second serious technical error in their paper. They propose the postulation of "game rules" in terms of difference between sample means. In the clearest of their examples (Example 2), an effect size of .60 is judged to be important and "a d of 59 percent or less of the common within-standard deviation [sic; the word "group was left out] will support the null" (p. 15). That just isn't true. If a population effect size of .60 is tested against a population effect size of 0, and a sample effect size of .59 is obtained after appropriate power considerations are taken into account, the null will be rejected, not retained. (One is much more likely to obtain an effect size of .59 in the sample if the population effect size is .60 than if it is 0.) In short they (Cohen and Hyman) have inappropriately substituted minimum between-sample effect size for population effect size. The article contains other technical errors, for example: "the power of a statistic" (p. 12)—power pertains to a statistical test, not a statistic—but they are too numerous to mention in a letter such as this. On page 13 of their article, the authors make the very perceptive observation that "barely a handful of researchers systematically figure the game rules before the fact." But an overemphasis on alpha is not to blame for that; barely a handful of researchers ever draw a random sample in the first place. That is where the fault lies.