What If Most of the Findings Published in Psychology (and Medicine and Biology and …) Journals Are False?


We are forced to confront this appalling question by the publication in Science this past August of an article by the Open Science Collaboration, “Estimating the Reproducibility of Psychological Science”. The article reports the empirical results of attempts to replicate 100 studies published in 2008 in three leading psychology journals. The basic finding is that only 39 results out of 100 could be replicated. Moreover, only 47 of the original results were in the 95% confidence interval of the replication effect size. If this sample of 100 studies is representative of the best research in psychological science, then most elite psychological science is rubbish. (A brief and helpful discussion of the article was published at about the same time in Nature.)

The “Open Science Collaboration” was an ad hoc collection of 270 research psychologists assembled by social psychologist Brian Nosek of the University of Virginia through his Center for Open Science, “a non-profit technology company providing free and open services to increase inclusivity and transparency of research.” The three journals in question were Psychological Science, The Journal of Personality and Social Psychology, and the Journal of Experimental Psychology: Learning, Memory, and Cognition. Assignment of articles to replication teams was constrained both by chronological order of publication (to prevent cherry picking or other bias in selection of studies) and by the interest and expertise of the replication teams. Researchers with the appropriate expertise to conduct particular studies were actively recruited. Significantly, the authors of the original studies were consulted and critiqued the replication designs and provided the original stimulus materials where possible.

Thus, although some of the replication failures must no doubt have been due to inadequacies in the designs of the replication studies, it is unlikely that this accounts for the bulk of the replication failures, as some have suggested. Even if 15% of the replication failures resulted from flawed designs, the remaining replication failures would still amount to over half of the original studies. And even if half of the replication failures are thus flawed, the implication would still be that over 30% of what you read in an elite psychology journal is false. Feel any better?

Nor is it any good to claim, as some have, that replication failure is actually a good thing, because it teaches us all the fine specifications and nuances required to generate a given effect. That replication failure can lead to greater insight into the nature and causes of an effect is noted by the Open Science Collaboration and indeed by nearly all commentators on this topic that I have read in the past few days. The trouble is that that is not the sort of replication failure we are confronting.

To begin with, let us note that the replications attempted by the Open Science Collaboration were not so-called conceptual replications; i.e., attempts to generalize an effect to slightly different conditions or applications. For example, consider the well-known finding that self-control is a general but limited resource that can be depleted like a rubber band–driven propeller. Suppose that this finding was originally demonstrated by an experiment in which hungry participants were either allowed to eat from a tray of sweets or asked to resist the available sweets and eat radishes instead. Subsequently, the two groups are set to solve difficult—indeed, unsolvable—puzzles. Sweet eaters persist at the puzzles significantly longer than radish eaters, which is explained by the experimenters as being due to the radish eaters having depleted their self-control earlier when they were resisting sweets. Now, a conceptual replication of this original experiment would attempt to extend or generalize the finding by changing the conditions somewhat, say by asking participants to count backwards by threes while annoying loud music plays instead of resisting sweets, or to squeeze a stiff hand grip for as long as they can instead of persisting at puzzle-solving.

Such a replication, if successful, would demonstrate the operation of the putative psychological process in novel circumstances, and it would teach us something about the conditions governing the operation of that process. And if unsuccessful, we would begin to learn something about the limits to generalizability of the underlying process. But this is not the sort of replication that the Open Science Collaboration authors were attempting. Rather, they were attempting to obtain exact replications; i.e., as the name implies, replications in which the exact conditions of the original study are duplicated as closely as possible. In such conditions, the same result, if valid, should usually be observed.

Of course, a failure of exact replication still might only mean that the conditions determining the original result are insufficiently understood—that the original conditions had features whose importance for generating the effect wasn’t realized and therefore weren’t duplicated in the replication attempt. However, there are two problems with this suggestion. First, in many cases it would gut the result of much of its interest. In the self-control example, for instance, the discovery that self-control is a limited resource only for puzzle-solving, not for other forms of work, seriously limits its practical and even theoretical interest. Again, in clinical trials, failure to replicate the efficacy of some treatment may seriously restrict or even altogether destroy its value.

But this is assuming that the original effects can be eventually replicated somehow, only under less general conditions than originally supposed. The more serious worry is that this might not be so, that the original result is simply a mirage that eventually turns out not to be replicable at all. And indeed, this is precisely the phenomenon that has come to be observed, with increasing alarm and in a variety of fields, not just psychology, over the past ten to fifteen years. For example, in 2005 epidemiologist John Ioannidis published in the Journal of the American Medical Association a study examining the fate of all the original clinical medical research studies published between 1990 and 2003 that had garnered 1000 or more literature citations. There were 45 such studies that had found a positive effect. Of these, 7 (16%) were contradicted by subsequent studies, 7 (16%) were able to be replicated only with a clinical effect half the size of the original study or less, 20 (44%) were replicated, and 11 (24%) remained unchallenged (in many cases probably because they were too recent for follow up studies to have been completed). Thus, even in medicine, and the most prestigious medicine at that, about a third of studies have serious replication problems (and over 40% of those for which replication was attempted). The affected studies are important research. Among the major studies that were flatly contradicted by subsequent replication attempts were the “finding” that hormone replacement therapy reduces the risk of coronary artery disease in postmenopausal women and the “finding” that vitamin E reduces the risk of coronary artery disease and myocardial infarction. In every case, the replication studies used larger sample sizes and stricter controls than the original studies. Thus, the replication failures are not likely to be due to laxity in the replication attempts.

Another example, described by Jonah Lehrer in the best single discussion of the replication debacle I have found (ironically in a popular journal), “The Truth Wears Off,” (The New Yorker, Dec. 13, 2010), concerns fluctuating asymmetry in biology. It is a fact that a high number of mutations in one’s personal genome tends to show up as bodily asymmetry, for example as different lengths of fingers on each hand. In 1991, Anders Møller, a Danish zoologist, found that female barn swallows strongly preferred to mate with males that had more symmetrical feathers. This was a spectacular result, since it seemed to show that female sexual attraction in barn swallows had evolved to use body symmetry as a proxy for high quality genomes, and it stimulated a flurry of follow up research. Over the next three years ten more studies were published, nine of which confirmed Møller’s finding in the case of barn swallows or extended it to other species, including humans. But then in 1994, of fourteen attempted replications or extensions of Møller’s result, only eight were successful. In 1995, only half of attempted replications were successful. In 1998, only one third. Moreover, the effect size was shrinking even among successful replications. According to Lehrer’s account, “between 1992 and 1997, the average effect size shrank by eighty percent.”

One more example (also from Lehrer). In the 1990s a series of large clinical trials “showed” that a new class of antipsychotic drugs, including those marketed under the names Abilify, Seroquel, and Zyprexa, strongly outperformed existing drugs at controlling the symptoms of schizophrenia. These drugs accordingly were approved and became big sellers. However, by 2007 follow up studies were showing effects dramatically less than in the original studies of the previous decade, and it is now to the point where many researchers claim that the newer drugs are “no better than first-generation antipsychotics, which have been in use since the 1950s.”

Lehrer gives further examples, and see also the papers cited in Yong (2012), Spellman (2015), and Lindsay (2015). Thus, the replication problem is not really new and not restricted to psychology, though I have the impression it is best documented in psychology, biology, and medicine. What distinguishes the Open Science Collaboration article is not the worry that many or even most new research findings are false, but its presentation of direct experimental evidence to this effect. In short, the epidemic of replication failures appears to be a problem of disappearing findings, not the normal, healthy, “wonderfully twisty” path of scientific discovery. It is the radical diminution or disappearance altogether of findings that turn out to be largely illusory. Cognitive psychologist Jonathan Schooler, who was alarmed to discover this problem in his own research and honest enough to acknowledge it, calls it “the decline effect.” Notably, in so calling it he follows J. B. Rhine, the famous pioneer of research in parapsychology, who also was frustrated by the tendency of his own positive findings to disappear over time.

If it has taken a while for the replication problem to come to the attention of psychologists, one reason is the reluctance of journals to publish replications, especially failed replications. Journals look to publish exciting, new, positive findings. They have an aversion to old news, and they particularly do not welcome studies that throw cold water on hot new findings. Some details on this aspect of the problem are provided in Ed Yong’s Nature piece, “Replication Studies: Bad Copy.” Yong cites the difficulty Stéphane Doyen experienced in trying to publish a failed replication of John Bargh’s famous study of age-related priming. This was the study where Bargh asked participants to unscramble short sentences, some of which contained words related to aging and the elderly, like Florida, wrinkle, bald, gray, and retired. The important finding was that participants thus primed with age-related words walked more slowly down the hall to the elevator after they believed the experiment was over than participants who had not been so primed. (The participants, of course, had not been informed of the true purpose of the experiment. Nothing had been done to explicitly alert them to the question of aging.) This finding, amusingly called the “Florida effect,” has become a classic with 3800 citations according to Google Scholar. Jonathan Haidt and Daniel Kahneman, in their recent books, both take the finding for granted as a fact (Haidt, The Happiness Hypothesis, 2006: 14; Kahneman, Thinking, Fast and Slow, 2011: 53). But I have the impression that there has never been an exact replication (in the above sense) of the effect. According to Yong, Doyen’s failed replication was rejected by multiple journals and finally had to be published in PLoS ONE, a multidisciplinary, open access journal that “accepts scientifically rigorous research, regardless of novelty. PLoS ONE’s broad scope provides a platform to publish primary research, including interdisciplinary and replication studies as well as negative results” (from the journal website). According to Yong, after Doyen’s paper was eventually thus published, it “drew an irate blog post from Bargh. Bargh described Doyen’s team as ‘inexpert researchers’ and later took issue with [Yong] for a blog post about the exchange.”

Now, ungracious reactions to unwelcome results are nothing new in the history of science, and the point is not to single out Bargh but to highlight just how tenuous may be the hard evidence that backs up even the most celebrated findings in a culture that discourages replication. If the Florida effect has never been exactly replicated but only conceptually replicated, and if the conceptual replications have never been exactly replicated either, and if over half of attempted exact replications fail, then how sure do we have a right to be that there is really any such thing as the Florida effect? I want to stress the importance of this question. The Florida effect is not just a psychological curiosity, an isolated finding to which we can take an easy come, easy go attitude. The underlying principle which the Florida effect is taken to illustrate—the breadth and power of associative memory to unconsciously influence our conscious thought processes and behavior—has become one of the architectonic principles of cognitive psychology in the past couple of decades. This is its role in both Haidt’s and Kahneman’s theories, for example. Clearly, it is critically important that the findings that support such principles be facts, not illusions.

It is time to ask: What explains the decline effect? How can it happen that so many carefully produced experimental findings evaporate? Our epidemiologist Ioannidis proposed an answer in a second, quite famous (3259 citations) paper also published in 2005, spectacularly titled, “Why Most Published Research Findings Are False.” The paper was published in a medical journal (PLoS Medicine), and Ioannidis seems to have genetic association studies very much in mind, but he does not qualify his claim that “it can be proven that most claimed research findings are false” by restricting it to any particular field or set of fields. This would be an amazing result, if it could really be proved, but I do not find Ioannidis’s argument, such as it is, very persuasive. He presents a set of statistical formulas, whose derivation he does not bother to present, and—much more importantly—whose assumptions he does not justify or even discuss. (The presentation of formal “results” ex cathedra, before which we are apparently supposed to prostrate ourselves like so many Medes before the Basileus, is an irritating feature of supposedly hard science journals. But I suppose it is déclassé to complain.) Nonetheless, the argument is interesting, which (besides the fact that it has apparently been influential) is the reason I take the trouble to comment on it.

The basic idea is not difficult. It can be put by saying that if the frequency of relations in the world to be discovered by experiment is sufficiently low, then even an experimental method with a seemingly low false positive rate will generate mostly false positives. Thus, suppose that the logical space of variables we are exploring contains 100,000 possible relations, of which only one is actual. And suppose our experimental method is capable of detecting such relations with a false positive rate of one in a hundred tests. Then, roughly, of every 100,000 tests performed, on average 99,000 will return true negative results, 999 will return false positives, and 1 will return a true positive. This means that the ratio of false to true positives is (roughly) 999 to 1. This is not a good ratio! It certainly confirms the assertion that “most claimed research findings are false.”

Ioannidis’s analysis is a bit more complex than what I have described, of course. In particular, it includes a factor for study power, which I am neglecting. But what I have described is the meat of the matter. It depends essentially on the factor Ioannidis calls R, the ratio of actual to possible relations among variables of interest to a given scientific question. Moreover, R does not have to be particularly small to start causing trouble. Suppose that rather than R = .00001, as in the previous case, we have only R = .1. Then, in 100,000 random tests, we should on average encounter 10,000 actual relations and 90,000 nonrelations. If we take the traditional α = .05 significance level as our false positive rate (instead of the .01 of the previous case), then we can expect 4,500 false positive results from our 90,000 nonrelation tests. Even assuming perfect power to detect the actual relations, 4,500 / 14,500 of our positive results, nearly a third, are false.

This is a clever point, which I admit I never thought about before. It is basically the problem of base rate neglect applied to the context of scientific research (see also Tversky and Kahneman, “Evidential Impact of Base Rates,” in Kahneman, Slovic, and Tversky, eds., Judgment under Uncertainty: Heuristics and Biases, 1982: 153–160). But I said I do not find it particularly persuasive. There are several reasons for this. For one thing, it could fairly easily be accommodated by using Bayesian statistics instead of traditional null-hypothesis statistical testing. For another, if this were a serious problem—if today’s typical published study had R = .01, for example (much less R = .00001)—then successful replication would practically never happen. But it does. Many studies are successfully replicated, and let us remember that the decline effect is so named just because effect sizes tend to shrink, not instantly disappear altogether. Thus, Ioannidis’s analysis does not account for the pattern of positive findings and their subsequent decline that we frequently observe.

More importantly, there are good reasons to think that R is usually not small. It may be small indeed in exploratory research of the kind Ioannidis seems sometimes to have in mind. In the practical example he provides, the investigators do a whole genome study to discover whether any of 100,000 genes are associated with susceptibility to schizophrenia. Thus, assuming perhaps 10 genes may be thus associated, we have R = .0001, and obviously this is quite a problem. (And it’s hard to believe for just this reason that genetic association studies really use traditional null-hypothesis statistical testing, but Ioannidis would know about this much better than I would.) But a great deal of research, and most of the sort I am concerned about, is not blindly exploratory in this manner.

For example, consider once again Bargh’s Florida effect. What is R liable to be in this case? How did Bargh decide on the design of his experiment? Was his research question, “What would make people walk slower than usual?”, and did he then choose to test age-related word priming at random from among several hundred possible variables? Certainly not. Rather, he most likely started from the hypothesis that associative memory is a pervasive cognitive structure capable of influencing almost any conscious process. This would be a hypothesis he had strong reason from previous research to suspect is true. From this hypothesis it would follow that semantic priming for aging might associatively affect almost any other process, including walking. Thus, an empirical finding that age-related word priming induces slow walking, which is quite startling a priori, helps confirm a hypothesis whose prior probability (in the Bayesian sense) is not particularly low. In Bargh’s experiment, then, plausibly a hypothesis with a relatively high prior probability (i.e., a relatively large R) is supported by evidence with relatively low prior probability. This is just the sort of condition that makes for strong confirmatory power, which is how Bargh’s experiment is usually interpreted. Most research in cognitive psychology—and I should think in most experimental science per se—is generated in this manner, not in a blind exploratory fashion. If so, then Ioannidis’s clever problem is not a serious threat to most research.

But if Ioannidis’s suggestion does not explain the decline effect, what does? Of the ideas I have surveyed, just two seem really plausible. Both can be summed up by the same word: bias.

The first source of bias is forcefully illustrated in a 2011 paper by Simmons, Nelson, and Simonsohn, “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant.” The authors name four aspects of research design which investigators typically adjust on the fly, even though such adjustments can seriously increase the risk of obtaining a false-positive result. One of these is the choice of sample size. It is not unusual for an investigator, after collecting a certain number of observations without obtaining a significant result, to collect another round of observations in the hope of obtaining one. This procedure significantly increases the probability of obtaining a false-positive result, but it may not be mentioned in the methods section of the article where the results are reported.

Another flexible aspect of research design is the choice of dependent variables. It is not unusual for more dependent measures to be collected than are reported. The danger of this should be obvious. Imagine that in the Florida effect study, for example, besides walking speed, the investigators also asked participants to estimate the ages of people shown in photographs, measured the time participants spent putting on their coats after the experiment, and asked participants to estimate the weight of a heavy object by lifting it. If these other measures are not found to be significantly related to age-related priming, it may not be reported that they were ever collected. Yet it is clear that the investigators have in effect performed four experiments, not just one. So the risk of a false-positive result is increased by a factor of four, unbeknownst to the reader of the article in which the positive result is reported.

The other two flexible aspects of research analyzed by Simmons et al. were the use of covariates (which may sometimes be employed with little theoretical justification although they significantly alter the strength of the relation between the independent and dependent variables) and the reporting of subsets of experimental conditions (for example, if a treatment was administered at low, medium, and high levels, any of the three possible pairwise comparisons [low–medium, medium–high, or low–high] might produce a significant result even when the linear relation between the three [low–medium–high] does not).

The authors ran a computer simulation in which 15,000 random samples were drawn from a normal population and flexibly analyzed according to the four practices they examined. When all four practices were combined, a significant “publishable” result was produced 60.7% of the time at the .05 level, and 21.5% of the time at the .01 level.

It should be noted—and stressed—that the use of these flexible, discretionary practices in the search for significant relations (and other practices, such as the elimination of “outliers” from a dataset and the mathematical transformation of variables) does not necessarily imply any sort of fraud or malicious intent on the part of the investigator. The practical reality of designing and conducting a study is almost always considerably messier than the model of logical scientific reasoning and methodology presented in the final report. It is common and accepted practice for design and analysis decisions to get made as a study proceeds. (For empirical evidence that the practices studied by Simmons et al., as well as other questionable research practices, are in fact commonplace, see John, Loewenstein, and Prelec, 2012.) Unfortunately, as Simmons et al. show, the sorts of discretionary decisions described here, in conjunction with an investigator’s powerful interest in finding a significant result, can result in a false positive rate far higher than .05. Most investigators probably have little awareness of this impact of their discretionary decision making on the quality—the credibility—of their results.

It is not only investigators who are biased in favor of significant results. Journals provide a second important source of bias. As I mentioned earlier, journals, and especially the more prestigious journals, want to see positive, exciting, novel results. In psychology, many of the best journals simply do not publish exact replications, whether successful or not. In view of the problem with experimenter bias just described, this obviously has the potential to create a very faulty body of “results.” It is not hard to see how a spectacular result, like the Florida effect, once published, might be quickly “supported” by a barrage of conceptual replications, which, in view of the problems of experimenter bias just discussed, might not be too hard to come by. And so excitement and confidence builds around the Florida effect, even though (let us suppose) neither it nor any of its conceptual replications has ever been exactly replicated, because the journals are not interested in exact replication, despite the danger this presents that research programs and major theories in psychology might be constructed on illusory basic findings.

Over time, of course, findings that once were novel and exciting become orthodoxy and therefore fair game for revision and even attack. Novelty and excitement can now be had at their expense. So the biases in their favor become relaxed and research supporting them declines. It might even become interesting to show that they can’t be exactly replicated. The decline effect sets in.

How serious is the replication problem? Speaking for myself, I wasn’t much concerned until I began reading the articles on which this post is based. Until the Open Science Collaboration published their article last August, the arguments and complaints of the stats and methods heads were mostly theoretical and mostly indistinguishable from what any trained psychologist has been hearing from such people since the first year of graduate school. Everybody says they hate null-hypothesis statistical testing and ridicules the arbitrary .05 level, and everybody knows that a criterion of α = .05 for statistical significance means one in every twenty findings (on average) of a “significant” result is false. But null-hypothesis statistical testing goes on because it is easy to understand and perform, and as for the likelihood of false positives, what is the alternative? Halt research? So initially I wasn’t inclined to pay too much attention to navel-gazing meditations on the supposed crisis roiling psychological science for the past five years.

But what I realize now is that the decline effect is different. It is one thing to issue warnings about problems that might result from less-than-pristine research methods. It is quite another to document empirically that a wide swath of research findings, including some that are highly cited, is in fact evaporating. What makes the Open Science Collaboration article so important is that it presents the first hard evidence that exact replications are unobtainable for many and perhaps most research findings published in top psychology journals, so that the methodologists’ worries about the potential of questionable research methods to produce false results needs to be taken a great deal more seriously than psychologists have been inclined to do up to now. If most of what is getting published in Psychological Science and The Journal of Personality and Social Psychology and The Journal of Experimental Psychology is false, then we have a genuine crisis.

The latest issue of Psychological Science has an editorial promising to take the replication crisis seriously and improve reviewing standards at the journal. What would have been more comforting is if the author had simply issued a new set of concrete requirements. (Simmons et al. suggest such a set, which would do for a start.) The good news is that I don’t think this problem is going away. People seem bound to keep pushing on this, and as long as people keep pushing, serious change is bound to come. Stay tuned for further developments.

Postscript added February 2, 2016

The latest APS Observer has a banner across the cover trumpeting “Psychological Science’s Commitment to Replicability.” (APS is the Association for Psychological Science.) Inside is a two-page interview with PS’s interim editor D. Stephen Lindsay—author of the PS editorial I mentioned at the end of my post. On first reading, I found his remarks as underwhelming as I had his editorial. There is the same emphasis on urging researchers to do better, as though brave resolutions to hold ourselves to stricter methodological standards will be sufficient to change behavior in the face of strong institutional incentives to publish exciting results (and none at all to embrace strict methodological standards). This seems particularly obnoxious and unrealistic coming from a senior scientist with nothing much at stake, to the extent that the advice is directed to students, postdocs, and junior faculty who are struggling to have any career at all.

There is also the same notion that readers can protect themselves against questionable research by watching out for studies with small sample sizes, surprising results, and p values too close to .05. This, when the root of the problem is “p hacking” methods of the kind described by Simmons et al. (2011), such as reporting only a subset of the dependent measures collected and continuing to collect data until a “significant” effect appears. As Simmons et al. show, the effect of these techniques is to vastly inflate the effective p value and in many cases to practically guarantee that at least some publishable result will be obtained. Yet, if the questionable techniques are not reported, there is no way for the reviewers or readers to know they were employed.

I have become so pessimistic about this situation as to think there is really no way to re-establish the credibility of psychological science but to start publishing exact replications in leading journals like PS. And very soon after my post was up, I regretted that I had not made the conclusion stronger, to the effect that if Lindsay were serious, he would move to start publishing exact replications in PS, since nothing less would fix the problem.

Fortunately, in just the past few days I have learned of two encouraging developments that are making me feel a whole lot better. First, PS has a program of “Open Practices” badges that can be awarded to articles whose authors conform to certain open science guidelines. There are three badges: Open Data, indicating that all the study data is permanently available in an open-access repository; Open Materials, indicating that all the study materials are likewise available, and also that the authors have provided complete instructions sufficient to enable outside investigators to perform an exact replication; and best of all, Preregistered, indicating that the design and data analysis plan for the study was preregistered in an open-access repository.

Preregistration of studies is a great idea, because nearly all the jiggery-pokery involved in p hacking would be prevented if an exact plan for data collection and analysis were published in advance of conducting a study. The Open Science Framework is the leading organization known to me that promotes and provides facilities for preregistering studies. The trouble, of course, is that it’s merely voluntary. That’s why something like the badge system in PS is good news. It provides a way for researchers to get credit for doing things right—or well—and a way for readers to gauge the methodological quality of articles. The badges appear on the title pages of the articles and in the journal table of contents. The latest issue of PS has Open Data badges for 7 of 13 articles and an Open Materials badge for 1. The previous issue had 6 Open Data and Open Materials badges out of 15 articles. No Preregistration badges, which is not surprising—that’s a very tough standard. Seems like not a bad start (the program is in its second year). Odd that Lindsay didn’t mention this program in his interview and gives it only the briefest of mentions in his editorial. I learned about it only because it is featured in the current edition of the weekly email that PS sends to APS members.

The second new development is that APS has launched a new type of journal article, the Registered Replication Report (RRR), which reports attempts to exactly replicate key findings. Target findings are identified by their influence and by their theoretical and practical importance. Study design and data collection protocols are developed in collaboration with the original authors when possible and preregistered, and data collection and analysis is performed by multiple institutions. The results across institutions are then meta-analyzed in an attempt to provide a definitive measurement of the effect size. So the RRR represents a formal, resource-intensive effort to exactly replicate a key finding.

Only two RRR’s have been completed so far. The first, published in Perspectives in Psychological Science last October, successfully replicated Schooler’s finding that verbally describing a person one observed commit a simulated bank robbery caused worse performance at recognizing that person in a lineup. Schooler’s effect is one of the vanishing effects featured by Lehrer in the New Yorker article I cited in my post. The successful RRR replication comes with caveats: The RRR effect size (16% worse recognition than controls) is considerably reduced from what Schooler originally found (25%), and the effect is only observed when there is a 20 minute delay between witnessing and verbally describing the event. Verbal description immediately after the event had minimal effect (4%).

The second RRR, to be published in the next issue of Perspectives, examined an effect in which behavior that was described imperfectively (what the person was doing) was interpreted as being more intentional and was imagined in more detail than if it was described perfectively (what the person did). This replication failed.

Replication failure is unpleasant. But the fact that it’s now being done, even on such a limited scale, is good news in the long run for restoring the credibility of psychological science.

18 thoughts on “What If Most of the Findings Published in Psychology (and Medicine and Biology and …) Journals Are False?

  1. Here’s a more serious comment, a thought on further implications for adjacent fields, like moral and political philosophy. Suppose that much of elite psychological science is, as you say, “rubbish”–something I began to suspect the minute I entered a counseling psych program and started making my way through The Journal of Personality and Social Psychology and the like, and something that people have been saying about the pharmacology-oriented parts of elite medical journals for a very long time. (Here too.) Doesn’t this argue for the relative autonomy of normative theory from social science, and argue for ratcheting back the now-fashionable idea that normative theorizing has to be “rooted in” social scientific findings?

    I don’t mean that normative philosophical theory should altogether forswear reliance on the social sciences; that would be pointless and unproductive. I mean: shouldn’t we self-consciously be doing normative theory on at least two separate tracks? One track should consist of questions and inquiries that are relatively autonomous of the findings of the special sciences (hard, social, or otherwise). Another track should consist of “mixed” questions and inquiries that depend on the findings of the sciences. But what we absolutely should not do is to blur the two categories so that they collapse. After all, if it turns out that the social sciences are in worse epistemic shape than their most ardent boosters had suggested, it’s a serious mistake to hitch the wagon of normative theory to what seems at present to be a non-truth-tracking enterprise. We’ll just end up building on and replicating their errors.

    Nagel has been a vocal (and in many ways, effective) proponent of the autonomy of normative theory, but I actually got the idea of a two-track approach from that very unfashionable philosopher, Mortimer Adler, who describes it in his books The Difference of Man and the Difference It Makes and The Four Dimensions of Philosophy.


    • Concerning the two-track idea, without more details I’m not sure I have anything useful to say. I guess the autonomous track would concern the fundamentals and the less autonomous track would concern narrower matters of detail or implementation? Would it really be possible not to blur these categories?

      However, in general as I think you know I am not sympathetic to the idea of keeping philosophy autonomous from science. The question is, what is your philosophy going to be based on, if not on facts? I know there are actually some serious answers to this question and that even today many or maybe even most philosophers don’t think philosophy should be based on facts. But my own preference is that philosophy should have a factual basis. Therefore, since science is our source of facts, philosophy and science cannot be separated.

      It is interesting as a general matter to think about the question what is the basis for philosophical knowledge. Offhand I can think of five possible bases. One would be some special source of knowledge that does not depend on the ordinary biological senses (vision, touch, etc.). For example, I guess G. E. Moore thought we had some special cognitive access to The Good. Also Plotinus and other mystic philosophers could have held this. Also—perhaps surprisingly—I think David Lewis should be placed here, since he held that we have a special form of cognitive access to possible worlds. (How he maintained this without getting laughed out of Princeton mystifies me. His sheer intelligence must have been just that intimidating.)

      Another basis for philosophy would be reason a priori. Anyone who tells you the senses are the enemy and you must use pure reason to discern truth goes here: Plato, Descartes, Spinoza, and I think Kant.

      Then there’s “intuitions,” meaning firm opinions. Many and maybe even most philosophers today believe that philosophy simply is the organizing and fashioning of intuitions into a consistent system that preserves and explains as many intuitions as possible. Such a system of intuitions is said to be in “reflective equilibrium.” The blame for having convinced modern philosophers that this is how knowledge works—and certainly philosophical knowledge—goes to Nelson Goodman, I suppose, though its antecedents go way back, at least to Aristotle. Another important popularizer was Rawls, both for normative philosophy and philosophy in general. I’ll just say flat out that I hate this idea. I would rather be a Plotinus-style mystic than embrace the idea that our opinions carry epistemic weight just because they exist. To be clear, what I am opposed to is not the reasonable point that in the investigation of any problem, one necessarily starts from where one is; i.e., with the beliefs one already has, not from some hermetic position free of all preconceptions. There is no such pristine starting point, and I don’t think it would be desirable if there were. But this reasonable point is not what the proponents of reflective equilibrium are saying, or not the important thing they are saying. The basis of the methodology of reflective equilibrium is that all intuitions deserve consideration merely by existing, because ultimately there is nothing else. There is no such thing as hard scientific evidence, from sense-experience, say, or confirmed scientific theories that can be used to epistemically evaluate our opinions (sorry, “intuitions”). All we have is a pile of intuitions, many of which conflict, which we have to systematize and reconcile somehow. Obviously this program has greater plausibility in some domains than in others. For instance, more in moral philosophy than in physics. I suppose it has more plausibility the harder it is to see how or what sense-experience or scientific theory would be relevant to a given domain. But, where (if anywhere) sense-experience or scientific theory are truly not relevant, we are in a lot of trouble. For, opinions per se count for nothing epistemically, and rebranding them “intuitions” doesn’t change that. I could go on, but I’ll restrain myself. Suffice it to say that intuition mongering implies some sort of constructivism about morals or whatever domain is in question. It is incompatible with realism.

      A fourth basis for philosophy is everyday knowledge, the kind we all acquire in the ordinary course of living and that requires no specialized investigation. I suppose this may be what you have in mind for the part of normative theory that should be kept relatively autonomous from science. The only philosopher I can think of who explicitly states that everyday knowledge as opposed to specialized knowledge should be the basis of philosophy is Leonard Peikoff.

      The fifth basis is all knowledge, including specialized scientific knowledge, of physics, psychology, biology, anthropology, linguistics, etc. This would be the view I favor. Boydstun too, I believe. And obviously lots of people. Russell would be good exemplar.

      I won’t bother defending the all-knowledge view or attacking any of the others (except that I couldn’t resist giving intuition mongering a few swift kicks). I would just observe once again that specialized science is quite clearly relevant to philosophical problems on at least some occasions, especially when science has something startling and counterintuitive to tell us, such as for instance that what time it is depends on where you are in space and how fast you’re going. But once you’ve admitted this for some cases, it’s going to be hard to maintain a bright line between kinds of cases where science is relevant and kinds where it isn’t. For the rest, my main aim here has simply been to make a simple inventory of views about the basis of philosophy.


      • David,

        Let me take your comment from the top, and work down.

        I guess the autonomous track would concern the fundamentals and the less autonomous track would concern narrower matters of detail or implementation? Would it really be possible not to blur these categories?

        Yes to the first question. I think it should be possible to isolate fundamental from derivative questions in philosophy, where fundamental questions (or fundamental aspects of certain inquiries) can proceed without essential reliance on the special sciences, whereas other, derivative inquiries do rely on them.

        I’m not sure I would draw the distinction as fundamental/detail or fundamental/implementation. I would draw it as fundamental/derivative. For instance, there’s the part of the free will/determinism dispute that can be done from an armchair (fundamental), and then the part that requires integrating the fundamental account with what neurophysiology is telling us about the brain. But there’s an asymmetric order to the inquiry: we need to know, conceptually, what the conceptual options are before we start drawing big metaphysical conclusions from fMRI imaging.

        In answer to the second question: I don’t see why not. There may be some blurring between the two categories at the edges of the category, but I don’t see why, in principle, every philosophical inquiry must rely on the findings of the special sciences.

        As for what you say further down: I agree that philosophy has to be based on facts, but don’t agree that the sciences are our only or even our best source of facts. Some of our sciences are–as you’ve pointed out–rubbish. We have better access to the facts by non-scientific means than we do via science. And there are topics that the sciences don’t investigate, either because they’re too abstract, or too particular, or because there are no incentives for studying them.

        Too abstract: scientists tend not to like abstract inquiries into the structure of the concepts they’re employing, or the coherence of the discipline as a whole. Everybody is supposed to know what “self-esteem” or “thought suppression” are, and it’s considered an unscientific waste of time to ask criterial questions about what counts as self-esteem or thought suppression. So such questions often go unasked within psychology.

        In my experience, counseling/clinical psychologists tend not to ask even the simplest questions about the coherence of the discipline as a whole. In counseling psychology, the thought-suppression literature tells you that thought suppression is the causal basis of a series of diagnosable mental illnesses, implying that if you don’t address thought-suppression in therapy, you won’t get at the root of the illness: you’ll merely ameliorate the client’s “surface” symptoms and produce new ones down the road. Meanwhile, the literature on “symptom substitution” tells you, blankly, that there is no evidence that failure to get at the “root” of an illness via therapy should ameliorate some symptoms but produce new ones in their place. These two literatures are running in tandem, often in the same journals, with no sense that the one is contradicting the other–indeed no sense that he one has any bearing on the other. You can read textbooks of counseling psychology in which the author invokes both literatures in different chapters of the book without any realization that doing so is incoherent (though I’ve linked to the fourth edition, I’m really referring to the second edition of this book).

        Too particular: Take almost any question in applied ethics or politics. I think that you’ll find the social scientific literature close to useless in producing the answer to any substantive question. I know it’s tooting my own horn, but my critique of Jason Brennan on character-based voting seems to me an example of this. The question is, “Is it permissible for people to engage in character-based voting?” Brennan purports to give a negative answer, based on the “best” political science in “the literature.” But if that’s the best that the “best political science” can do, it simply doesn’t answer the question in any straightforward way–and doesn’t even make a discernible contribution to doing so. What has Brennan’s vaunted invocation of “the literature” told us that we couldn’t have known in ignorance of the literature? My answer: close to nothing. And though that’s one example, I think the point can be generalized.

        I won’t belabor the “no incentives to study” issue. The very issue of replicability is an example of it. How is it that we’ve taken this long to figure out what so many people have suspected for decades–that huge swatches of “the scientific literature” are epistemically unreliable? The answer is that there were insufficient incentives to study it. But that’s true of a lots of things.

        As for your canvassing of the sources of philosophical knowledge, I think it’s too pessimistic.

        I won’t pursue source 1, Moore/Lewis-type mysticism. Parenthetically, your surmise about Lewis is correct: you couldn’t laugh at him because he was intimidatingly brilliant (as was Kripke). But note that your way of describing source 1 blurs into source 2 (the a priori), and I think it would be uncharitable to describe a priori knowledge as literally mystical.

        As for source 2: I am skeptical of the existence of a priori knowledge, but that skepticism has to be balanced against the fact that I’m not skeptical about mathematical knowledge (or formal logic), and I personally know no fully worked-out empiricist account of math or logic. It’s not clear how to characterize the factual basis of, say, calculus or how to characterize our knowledge of it. There are two possibilities here. Either we have a priori knowledge of mathematical truths, or we have a kind of a posteriori knowledge that is different in kind from (or at least very different from) our knowledge of the findings of biology, chemistry, physics, medicine, psychology, sociology, etc. The “autonomy” proposal merely entails that some of philosophy is like mathematics in that way, however we end up cashing out the difference (whether as a priori knowledge or some other way). There are fundamental questions of conceptual structure where the analyses we produce are analogous to our knowledge of algebra or calculus. I’m not an intuitionist, but that seems to me the grain of truth in intuitionism.

        Source 3, intuitions and reflective equilibrium: I think you’re granting everything here that I’d want to say. I think we should distinguish two different claims here:

        (a) S’s belief that p has epistemic weight simply because S believes that p.

        (b) Coherence is a necessary condition of knowledge, on any plausible conception of knowledge.

        We (you and I, not everyone) agree that (a) is false. But (b) is not trivial. The method of reflective equilibrium is essentially a method “coherentizing” our beliefs, and even if coherence is not by itself truth-tracking (I agree that it isn’t, except insofar as coherence is a necessary condition of knowledge, and knowledge entails truth), coherence is not nothing.

        Put it this way: take all your beliefs right now. Call that set B. The task of putting B into reflective equilibrium is a monumental task, and would be a monumental acheivement that few people have ever achieved. Is it necessarily knowledge? No. But it still can have enormous epistemic value. (Imagine teaching a class in which “all” you managed to accomplish was to induce the students to put their beliefs into reflective equilibrium.)

        Incidentally, though this is too long a thought to develop for this comment, I think Aristotle’s account of dialectic has theoretical advantages over the twentieth century Goodman-Rawls version of reflective equilibrium (granting the similarities). I meant to discuss this when you brought up Aristotle’s Topics in your post on Schwartz, but never got to it.

        Re source 4, everyday knowledge: yes, that’s what I had in mind. I guess Peikoff holds the view in question, and it’s a view I generally associate with Objectivism, but I can’t think of any one place in the Corpus Objectivisticum where it’s stated. Nagel defends it with respect to the autonomy of ethics from biology in “Ethics Without Biology” (reprinted in Mortal Questions</em>). Mortimer Adler defends it in The Four Dimensions of Philosophy. So, in a very different way, does Michael Oakeshott in Rationalism in Politics. I don’t know of any philosopher or work who defends the broad thesis as such. I just find it a prima face plausible view. We have an enormous amount of underrated everyday knowledge, and that knowledge is an essential source of a great deal of philosophical theorizing.

        All of the major philosophers prior to the twentieth century relied on everyday knowledge that we share with them–Plato, Aristotle, Aquinas, Locke, Mill. All of them wrote prior to what we could plausibly call modern science, so by definition, none of them relied on modern science in their major works. (Where they relied too heavily on the science of their own day, their work tends to be anachronistic and less useful to us.) But we still read them for philosophical insight (and get insight from them). I would say that that is only possible insofar as they are relying on everyday knowledge that remains knowledge for us. And especially in ethics, our identification and rejection of their worst errors was not primarily a matter of scientific discovery. We didn’t reject Aristotle’s theory of slavery or his views on the moral status of women or manual workers because one of the special sciences produced some technical finding that clinched the case. We don’t reject Aquinas’s views on masturbation because psychologists finally did the whiz-bang masturbation experiment that imparted the good news: it’s OK! (And no, we didn’t need Masters and Johnson or Kinsey to ratify the practice, either.) Mill’s “On Liberty” and “Subjection of Women” ring as true today as they did in the nineteenth century precisely because the facts he cites in those works are entirely ordinary, but counter-factually stable enough to persist across historical epochs and cultural contexts.

        Re source 5, all knowledge, I’m not denying that all knowledge (including specialized scientific knowledge) is relevant to philosophical inquiry somehow. Some inquiries require reliance on scientific knowledge, and we need philosophy to integrate the claims of knowledge as such, i.e., make them coherent with one another. But I would insist that “all knowledge” includes everyday knowledge (source 4). And if what you’re saying about contemporary science is correct, complete reliance on science would, in the present context, have to entail a wholesale shut-down of philosophical inquiry.

        After all, if much or most of the science on which “our” theorizing rests is “rubbish,” then so is the theorizing. Tenure and promotion aside, what point is there to doing rubbish-conducive theory? If philosophy is really that reliant on rubbish, then rubbish-avoidance entails that we should either stop doing philosophy or re-conceive it so that it’s not rubbish-reliant. Since I don’t think philosophy is that reliant on the special sciences, I think the problem is relatively contained. But if there are no bright lines to be drawn, philosophy would have to end up in the same rubbish heap as a great deal of science. Science is relevant philosophy “on…some occasions,” but that’s compatible with philosophy’s relative autonomy from science on fundamentals. If I could issue prescriptions to the field as such, I’d say: let’s spend more time drawing bright lines that preserve the autonomy of our field, and less blurring them so that philosophy collapses into social science.


        • Irfan,

          You raise a lot of interesting points on which I have opinions, but I’m going to resist commenting on most of them. I think what’s needed here is a general discussion of the relation of science to philosophy, with an eye to the relation of social and biological science to ethics.

          First off, although I don’t think philosophy can be autonomous from science, that doesn’t mean I think philosophy collapses into science. In fact, the relevance of science to philosophy often seems pretty remote. My own dissertation work is a case in point. For all that I learned a great deal about color science, color vision, the visual system, neuroscience, and cognitive psychology in order to feel adequately prepared to write my dissertation, that knowledge hardly features at all in what I wrote. Thus, philosophers who work on color all agree on the physical facts about color, notwithstanding that there are enormous differences among philosophers about the nature of color. Moreover, although there have been large changes in our understanding of color vision during the past several decades, it couldn’t be said that these have altered the basic philosophical issues surrounding color much or at all. The same is true of sense-perception. Obviously we have learned gobs about sense-perception in recent decades, especially about vision, but this has not really affected the fundamental issues in the philosophy of perception as far as I can see. Not that the philosophy of perception hasn’t changed! It has changed enormously and fast. But the changes have been driven by the dialectic within philosophy, not by new scientific discoveries.

          This may be somewhat less than relevant, but I’ll tell the story anyway. When I was on the job market and people were reading summaries of my research, I would sometimes get remarks like, “Gee, considering your background in psychology, it’s funny that psychological science doesn’t play a larger role in your research.” Only scientistic philosophers said this sort of thing—that is, philosophers who champion science and place naturalism (the project of finding an explanation in terms of current science for everything, including things, like consciousness and intentionality, that might seem to be beyond the reach of such explanations) high on their list of priorities. I suspect that what they really meant to express was disappointment that my work didn’t support the program of naturalism. However, it’s perfectly true that you can read my dissertation and encounter very little science. And this was striking to me myself when I was writing it. I noticed, to my surprise and disappointment, that my background in psychology was of very little use to me in my philosophical work.

          So, I get it. Philosophy and science often seem to be completely separate realms. I have often had the experience, when trying to explain some philosophical problem or issue to colleagues in psychology, that they find it hard to see what the problem is even supposed to be and openly wonder why any intelligent person would concern himself with such a “problem.” Conversely, I am often bemused, not to say appalled, by the casual conceptual sloppiness and tolerance for inconsistency (an example of which you note in your comment) among scientists.

          I have a little theory of what’s going on here. I think scientists and philosophers have fundamentally different projects. Science is basically about explaining phenomena, which means discovering the principles that account for what goes on. This means science is about what goes on, what changes. Science explains, and if possible predicts, what comes to be. This is clearly very valuable, and it may encompass most of the general knowledge we would like to have. But not all. And I suggest that what is left is the province of philosophy. One of the sorts of knowledge left over is normative principles, as in epistemology and in moral and political philosophy (and possibly esthetics). Another is knowledge of what things are.

          This last is best explained by example. A color scientist explains why we have the color experiences we do. To do so, he appeals to the properties of light and the surfaces of objects, the cone receptors in our retinas with their different spectral sensitivities, and so forth. When he’s all done, he has explained why you see red when you look at a stop sign and green when you look at a ripe cucumber. But he hasn’t told you what color is. I mean, is the red of the stop sign a stable disposition of the sign to produce red experiences in normal viewers in standard conditions? Is it the physical microstructure of the surface of the sign that enables it to have this disposition? Is it the disposition of the surface to reflect a certain wavelength profile when illuminated by white light? Is it the red quality you subjectively experience the sign as having but which it really doesn’t have? Or what? These are the questions that interest a philosopher but which the color scientist has difficulty taking seriously as questions.

          The case is similar for sense-perception. A vision scientist tries to discover the conditions and principles that govern the extraction by the visual system of information about the distal environment. Under what conditions when one looks at such-and-such does one gain what information and by what processes? But when this whole story is told, the vision scientist will still not have told us what a visual perception is. Is a percept some sort of internal representation of what is seen? Or is it a sort of acquaintance with it (e.g., the distal object itself)? Is it a complex consisting of the whole causal chain that extends from the sensible qualities that supervene on the final perceptual brain state out to the distal object that stimulated the visual processing and that the perceptual state gives one access to? Or what? Again, these are the questions the philosopher is keen to answer but which don’t even occur to the vision scientist as questions.

          We can sloganize this difference between the aims of science and philosophy by saying that science tells us what things do, and philosophy tells us what things mean.

          In the case of normative values and rules, I suppose it’s obvious that scientists don’t address themselves to these.

          If philosophy and science differ in this way, then we needn’t worry that philosophy will collapse into science. It is true historically that a lot of what philosophers used to speculate or theorize about has now been taken over by science. And it’s a good thing, too, as far as I’m concerned, because it means that studies like physics and psychology and linguistics have been put on a serious empirical footing instead of armchair cogitating. But this process can only go so far, because what science investigates is limited in the way I have indicated. Other inquiries, such as what things are and what values and rules are appropriate, will never be scientific matters. Is this autonomy enough for you? It seems to give you a lot of what you seem to want.

          But it isn’t really autonomy. Philosophy as I have construed it is still dependent on the facts that science discovers. True, unless science produces facts that are astonishing and unexpected, its findings are not liable to alter the correct philosophical account much. And often just a rough scientific account is enough to determine the outlines of a rough philosophical account. However, there are at least two sorts of case where science can have a strong influence on philosophy. The first is where science does produce facts that are astonishing and unexpected and that upset the seemingly established apple cart.

          My favorite example of this is the impact of relativity theory on the philosophy of time. A basic question in the philosophy of time is whether only the present exists or whether all of time, past, present, and future, exists. The former view, presentism, has always seemed to me to be the only sensible view. According to it, what exists is the whole spatial realm at the present moment. What is in the past does not exist. It used to exist, but now it is in the past; it has gone out of existence. Similarly, what is in the future does not yet exist. This is why, if I remember correctly, Aristotle says it is neither true nor false today that there will be a sea battle tomorrow, even if, let us suppose, when tomorrow comes it turns out (indeterministically) that a sea battle occurs. A truth needs, as it were, a truthmaker, and today the truthmaker for tomorrow’s sea battle does not exist.

          Anyway, all this good common sense got a kick in the head when Einstein came along with his theory of relativity and showed, so it seems, that the spatial realm that is cotemporaneous with a given moment at a given x, y, z coordinate differs depending on the speed and direction of what occupies that coordinate at that moment. Thus, consider a spatial realm of points all of which are at rest with respect to one another—what we call an inertial reference frame. Suppose two events, e1 and e2, take place simultaneously in this frame at different locations on the x axis, x1 and x2. At this moment, both events exist according to presentism. But now suppose there is an observer, O, who is moving very fast (say, half the speed of light) in the +x direction with respect to the reference frame. Then for O, e1 and e2 are not simultaneous. If O is at x1 when e1 happens, then for O, e2 has already happened! Thus, presentism has to say that for O, e1 exists at that moment but e2 does not exist. This seems to make existence observer-relative, so that certain things exist for some people and other things exist for other people. This is not the presentist conception! But it seems that, if relativity theory is right, the presentist must either make some sort of relativist modification or else give up the claim that only what is simultaneous with the present moment exists. Here science has had a major impact on what might have seemed to be a pretty sound philosophical theory securely based on everyday experience. (If anyone is interested in more on the impact of relativity theory on the philosophy of time, the earliest discussion of the problem known to me is also one of the best, which is Hilary Putnam’s essay, “Time and Physical Geometry,” Journal of Philosophy, 64 (1967): 240-247. The paper is also collected in his Mathematics, Matter, and Method.)

          The second sort of case where science can strongly influence philosophy is where even the gross facts are not that easy to establish, and so everyday knowledge or primitive scientific investigation is not sufficient to support philosophical conclusions. I think moral and political philosophy turns out to be this way. For example, an Aristotelian style eudaimonism requires a notion of human functions, so that well-being may be defined in terms of good functioning. But what is the nature of human functions? Are there really any such things, or is our talk of them just a fiction? How do we answer this question? Are they determined biologically by Darwinian natural selection? Could they be fixed culturally? And what is their content? How do we distinguish objective claims about what our functions are from mere opinions? It seems to me that to answer these questions, everyday knowledge won’t cut it. We are going to need the help of sciences such as evolutionary biology, hunter-gatherer anthropology, social psychology, empirical work on universals in human values such as the Schwartz theory, and so on.

          Similar remarks apply in the case of political philosophy even more obviously. Think of the influence of economic theory and especially of Austrian economics on libertarian political philosophy, for example. In general, what makes for a flourishing, prosperous society full of happy people is not obvious. (It is not even obvious that prosperity and happiness themselves are closely linked.) If it were, the people who say, “love is the answer,” would be right—it would be just that simple—and our basic social problems would have been solved long ago.

          Again, we are quick to produce moral judgments concerning particular events. But what is the psychology of this? Do these judgments reflect the application of abstract moral principles to concrete cases, or do they have some more purely emotional basis, as Hume supposed? Does the emotional dog wag the rational tail? Even if it does, how is the emotional dog formed? And if it doesn’t, what is the balance, the proportion of blind sentiment to rational principle? These are important questions for moral theory, and none has an obvious answer. We need science to help sort this out.

          You worry that if philosophy draws on science for its knowledge of facts and those “facts” turn out to be rubbish, then that philosophy will be rubbish too. True enough, but so what? Success is not guaranteed in the enterprise of knowledge, and that goes for philosophy as well as science. I’m not sure whether you just want philosophy to be as secure in its conclusions as possible or whether you think philosophy has to be potentially settled for all time, immune to the possibility of refutation by new evidence that besets science. The former is understandable; the latter I think is inappropriate.

          Everyday knowledge is not more secure than scientific knowledge. Other things being equal, it is less secure. Everyday knowledge of the past included such pseudo-facts as that the earth is flat and doesn’t move, that the sun goes around the earth, that objects fall with constant speed, and that maggots grow spontaneously from rotting meat. Scientific facts at a comparable level of generality have a better track record. Vague generalities of course tend to be pretty secure: the leaves of some trees come down in the fall; the moon cycles through its phases about every four weeks; when you drop objects, they fall; etc. But since science encompasses these, it has them covered. And since science encourages careful observation and testing, it is science that will weed out the pseudo-facts at this level, as well as establish more precise facts that everyday knowledge does not include.

          Where philosophy gets itself in trouble following science seems to be in adopting or taking for granted scientific theories. For example, Kant’s assumption that Newtonian mechanics is true, or Quine’s assumption that behaviorism is true. These two cases aren’t quite equivalent, actually. Whereas Kant had pretty good grounds for taking Newtonian mechanics seriously, behaviorism was always controversial. Moreover, behaviorism wasn’t even a theory, properly speaking—not a precise model making exact and testable predictions, like Newtonian mechanics. Behaviorism was always more of a research program, as is the cognitivism that replaced it. So in my opinion, Quine was foolish to place so much reliance on behaviorism as he did, whereas Kant was not to rely on Newton, even though Newtonian mechanics also turned out to be false.

          Regardless of these particulars, though, I think philosophy must rely on scientific theory where scientific theory is relevant or even essential, as in the cases I’ve cited above. Of course, the enthusiasm with which philosophy does this should be tempered by the level of empirical support a scientific theory enjoys. For example, I think it would be absurd to try to deny that we have to take seriously the threat posed to presentism by relativity theory on the grounds that, who knows, the theory might be false. Relativity is probably the most well-supported scientific theory we possess. On the other hand, the dual-process theory of cognition, much as I love it, has hardly the level of support that would enable one to treat it as an established fact or place great theoretical weight on it—not at least without making an acknowledgment that what one bases on it is no more credible than it is.

          One last point. The proper response to the revelation that huge amounts of published scientific “findings” may well be worthless pseudo-findings is not to avoid science but to fix it. Of course, in the short run, I really think great care is now necessary. I find myself a bit traumatized by this, and no joke. I am currently reading a scientific article on the motivations for prosocial behavior in very young children. It’s the first I’ve read since discovering the replication problem, and I have to say that every time the authors make some claim about what children do or don’t do in what conditions and base their claim on a single paper, I think, “Oh, yeah?!” It’s terrible. But long term, of course, the solution cannot be to run away. We can’t do without science, even as philosophers. The only answer is to get the problem corrected, mainly by changing publication standards and incentives, and especially by making replication studies an important part of research.


    • Clearly you haven’t read Jason Brennan’s take down of sophistic normative theorizing:

      “To what degree philosophical beliefs (or whether having an integrated philosophy) affect our behavior or our mental health is a social scientific question. In principle, we could test whether studying philosophy, or in particular studying or adopting Objectivism, makes people happier, more confident, less fearful and guilty, or whatnot. We could in principle study whether any correlations (between philosophical beliefs and behaviors, or between philosophical beliefs and psychological health) are selection or treatment effects. We could test to what degree people compartmentalize their beliefs, and how such compartmentalization affects their health and behavior. Some people actually do such work.

      If we wanted to know whether changing philosophical beliefs changes behavior, we’d study this with the tools of the social sciences. We’d suspend judgment until the evidence comes in. After all, it’s possible that changing our deep beliefs has little impact on our day-to-day activity. (It’s not like Berkeley acted much different from Reid.) It’s also possible that our behaviors and our psychological health are caused by deeper factors, and our philosophical beliefs are merely epiphenomenal. Perhaps the emotional dog wags the rational tail. Perhaps the economic superstructure causes us to behave in certain ways and believe certain things. Perhaps we’re each genetically disposed to behave in certain ways and to have a particular degree of psychological health, and we end up accepting beliefs that go along with those dispositions. (Unusually altruistic people become utilitarians or whatnot.) Or perhaps adopting new philosophical beliefs has the effects Rand claims it does.

      Again, these are all social scientific questions. A rational person would try to use good social scientific methods to tease out the causality here, and would suspend judgment until the evidence comes in. A rational person would not try to answer this stuff from the armchair, the way Rand does.”



      • I have read it. I just didn’t think enough of it to respond. He accuses Rand of histrionics and hand-waving, but the passage itself strikes me as guilty of precisely those things.

        First, let me distinguish two things. In one respect, Brennan is specifically taking issue with Rand’s essay. In another respect, he is taking issue with a conception of philosophy that is not specifically Rand’s but more general than that, and does not depend on the specific claims she makes in that essay. I am putting my eggs in the latter, not the former basket. I actually agree with Brennan that much of what Rand says in defense of philosophy is histrionic and hand-waving. But it doesn’t follow–and isn’t true–that his defense of a basically neo-positivist conception of philosophy does any better.

        It would be a waste of time to over-analyze what seems to me a post that Brennan tossed off in a few minutes. But let me take it from the top and move down:

        To what degree philosophical beliefs (or whether having an integrated philosophy) affect our behavior or our mental health is a social scientific question.

        What he’s saying here is not particularly clear. Does he mean that “to what degree philosophical beliefs affect our behavior or our mental health” is only a “social scientific” question? And what does that mean? Does it mean that some already-existing body of social science literature has already studied it? If so, which one and where? Or does he mean that ideally, such questions should only be studied by social scientists?

        A huge amount of work is being done here by the “only.” Before we do any experiments, we need to isolate the variables, produce plausible hypotheses about what might be causing what, and produce an experiment designed to track the relevant causal relations. Is Brennan saying that only social scientists are allowed to do all of that? No one who’s not a social scientist is allowed to isolate variables, produce hypotheses, provide analyses of the concepts involved in the hypotheses, or make inferences about causality? If that’s his view, the most obvious question to ask is: why? That question is neither posed nor answered. If that’s not his view, he hasn’t made his view clear.

        Here are some more unclarities. What is a social science, anyway? And who decides what counts as one? (Don’t tell me: only social scientists.) Is archaeology a social science? Stylometrics? Historiography? Musicology? All four fields involve people acting in society.

        I’m not pretending to be falsely naive here, or asking merely rhetorical questions. I really want to know: it’s not clear to me what the criterion is for something’s being a “social science.” The paradigms nowadays are (I guess) economics, political science, and sociology, but it isn’t clear to me what all three have in common that makes all three social sciences and tells us when a discipline is one. Near East Studies overlaps with comparative politics. Comparative politics is a branch of political “science.” Does that mean that NES is a “social science”? Is postcolonial theory therefore a social science? Is Islamic Studies a social science? Were Marshall Hodgson, Albert Hourani, and Fouad Ajami social scientists? I don’t know, and I doubt Jason Brennan does, either.

        It isn’t at all clear to me that psychology or psychiatry are “social sciences.” Psychiatry is supposed to be a branch of medicine, but though public health is a social science, no one thinks that cardiology or immunology are. I don’t know where that ultimately puts psychiatry. As for psychology, there is such a thing as social psychology, but I don’t think cognitive psychology is a social science, and “social science” is a pretty dubious description of clinical or counseling psychology. Is lifespan development a social scientific topic? In philosophy, we have moral psychology, but that obviously isn’t a social science. I’d be curious to hear what Brennan wants to say here. On the face of it, what he’s saying doesn’t really map onto what I know of “Psychology” as a discipline.

        Here’s the next claim:

        We could in principle study whether any correlations (between philosophical beliefs and behaviors, or between philosophical beliefs and psychological health) are selection or treatment effects. We could test to what degree people compartmentalize their beliefs, and how such compartmentalization affects their health and behavior. Some people actually do such work.

        I’m sure we could. But where is the argument that only social scientists can study any of that? Experimental scientists may design the experiments and produce the experimental findings, but is his point that only experimental scientists are allowed to study experimental findings, or make inferences from them? Studying correlations is not the same as making causal inferences, as I’m sure he knows. Are only social scientists allowed to make causal inferences? Or is no one allowed to? The passage is neutral between two completely different readings, neither of them plausible.

        If we wanted to know whether changing philosophical beliefs changes behavior, we’d study this with the tools of the social sciences.

        Again, are “we” (who? just “social scientists,” whoever they turn out to be?) limited to those tools? (Never mind that we have no account of what a social science is, and no way of getting one, except via the “tools” of the “social science” whose identity is in question.) Conceptual analysis is not a social science tool. Does Brennan mean that we’re not allowed to use it in an inquiry of the kind he’s discussing? That’s the most obvious face-value reading of what he’s saying. If he’s saying something weaker, he can feel free to amend what he said, but that’s not the face-value reading.

        We’d suspend judgment until the evidence comes in.

        This sentence implies the following absurdity: unless we have social scientific evidence for a proposition p, we cannot believe it; we must wait for social scientific evidence for p or else suspend judgment. I don’t know whether he want to say this for any p, or just any p that involves a correlation between two or more variables. Implication: When I’m driving down the Garden State Parkway, and there’s ice on the roadway, I shouldn’t believe that ice could cause my car to skid unless The Journal of Everyday Traffic Conditions produces a triple-blind study that produces statistically reliable (but possibly rubbishy) correlations among the following: Volvos going the limit, ice, highways, and skidding. If I see ice, I should just “suspend judgment” at 65 mph and let the social scientists tell the rest of the story.

        Well. Go back and read Potts’s blog post. Suppose ex hypothesi that large swatches of social science is “rubbish.” Then we must suspend judgment until we clear the rubbish, correct? In that case, shouldn’t Jason Brennan be counseling that we shut down the entire field of philosophy–including his own output? Oddly, he isn’t counseling that. He isn’t following what ought to be his own advice, either. He’s telling us that we should suspend judgment, while he himself makes ex cathedra judgments all over the place, and churns out book after book, and article after article, all of it putatively based on “social science,” and all of it–according the “the best” findings in “the literature”–unreliable rubbish. I’m sorry: what social science tells us that it’s OK to do this?


        After all, it’s possible that changing our deep beliefs has little impact on our day-to-day activity. (It’s not like Berkeley acted much different from Reid.) It’s also possible that our behaviors and our psychological health are caused by deeper factors, and our philosophical beliefs are merely epiphenomenal. Perhaps the emotional dog wags the rational tail. Perhaps the economic superstructure causes us to behave in certain ways and believe certain things. Perhaps we’re each genetically disposed to behave in certain ways and to have a particular degree of psychological health, and we end up accepting beliefs that go along with those dispositions. (Unusually altruistic people become utilitarians or whatnot.) Or perhaps adopting new philosophical beliefs has the effects Rand claims it does.

        Again, these are all social scientific questions.

        If I read him very uncharitably (and why wouldn’t I?), his asking these questions would be incoherent. If these are all social scientific questions, then a non-social-scientist like Brennan has no business asking them. He has no business generating questions, generating hypotheses, clarifying the concepts involved in the hypotheses, or making inferences from the experimental findings on those hypotheses. All of that is the business of Bona Fide Social Scientists, and he isn’t one (unless he has a revisionary conception of what counts as a “social scientist”).

        But if what he’s really trying to say is that you need social science to confirm otherwise plausible hypotheses about the causal relations between, say, beliefs and behavior, or beliefs and emotions, I would say that he’s ignoring the fact that there are two sources of information here. One is experimental findings, and the other is introspection. Introspection can’t be discounted as a source of information here because the experimental findings themselves rely on introspective reports. Take introspection out of the equation altogether, and there wouldn’t be experimental findings in psychology. Yes it’s true that introspection is sometimes unreliable. But so are experimental findings. And we only know about introspection’s unreliability by essential reliance on introspection itself. So introspection can’t be globally unreliable. Aristotle produced the cognitive theory of the emotions long before Aaron Beck did, but unlike Beck, Aristotle had no access to the resources of contemporary psychology. How did he do it? By the expedient Brennan both ignores and implicitly derides–by introspection.

        A rational person would not try to answer this stuff from the armchair, the way Rand does.”

        A rational person would know that there’s a lot of work that has to be done from an armchair–and can be done from there–without essential reliance on social science. Every social scientist also knows that a great deal of social science begins with “armchair reflections” and a huge amount of the data in social science involves introspective reports of people sitting in armchairs. A rational person would also not give others advice that he doesn’t himself follow. Finally, a rational person would not rely heavily on preposterous-looking categorical claims (“all”) and then quietly ratchet them back every few sentences so as to keep his claims within the realm of plausibility.

        You knew you were baiting me by giving me a passage from Brennan, didn’t you, Matt? It doesn’t take a social scientist to have predicted the result.


    • Irfan,

      I think philosophy has special work in its theoretical areas and in ethical theory, and I think philosophers should keep straight what in their meta-field is drawn from science, from ordinary experience, and from conceptual/logical philosophical reflection. It’s good too, I think, to sort out what can only be answered by science, as opposed to philosophy itself, both in our own modern outlook and in our modern dissection of the thought of old guys like Aristotle.

      Formulating our conception of basic human nature should assimilate, I think, the findings of modern biology and of psychology, especially developmental psychology. What Jerome Kagan or Susan Carey, for example, report and interpret in their books, we should inspect, analyze, check for specific reproductions of experiments, and if evidently qualified, let flow into philosophic conception of human nature. There has to be integration, of course, with common experience and with the history of science and the formal disciplines.

      Thanks for the link to the Nagel’s MORTAL QUESTIONS. I’m ordering it, and I’ll be especially interested in Chapter 10 “Ethics without Biology.”


  2. Pingback: What If Most of the Findings Published in Psychology (and Medicine and Biology and …) Journals Are False? | Write There

  3. Because my only thoughts are apparently about Aristotle, I want to float an idea that Irfan mentioned but didn’t develop about Aristotle’s conception of dialectic. I agree that Aristotle’s method is akin to, but importantly different from, coherentist approaches aimed at achieving reflective equilibrium. But the feature of Aristotelian dialectic — at least in principle, though perhaps not always in his own practice of it — that is most relevant here is that it enables us to avoid choosing between giving unconditional priority to scientific theories, ordinary knowledge / beliefs, ‘intuitions,’ or a priori knowledge (if there is any), but without simply treating ‘all knowledge’ on a par. Part of the trouble here is that we often aren’t in a position to say with certainty whether something counts as knowledge or not. In one sense, it’s trivially true that philosophy should draw on ‘all knowledge,’ but as Irfan suggests, even prevailing scientific theories often involve claims that turn out to be false or at least in serious need of qualification. So too, what we think of as ordinary knowledge sometimes turns out to be false or in need of qualification, and hence not knowledge after all, and the same thing is pretty clearly true of intuitions. In principle, all knowledge claims are open to question, and in advance of inquiry defending one or another set of claims we don’t know whether those claims amount to knowledge; what we have are (more or less justified, reliably formed, etc.) beliefs. But if we follow Aristotle’s lead, we won’t try to defend one or another source of beliefs as foundational for philosophy, with the others gaining entry only by courtesy of the privileged kind. Instead, we’ll take ordinary belief, intuitions, scientific findings, and purportedly a priori knowledge as reputable beliefs, and a good philosophical theory will be one that seeks not only coherence among them, but explanation, where explanation crucially involves not explaining the truth of reputable beliefs (since much of philosophy is of interest primarily because reputable beliefs at least seem to conflict), but explaining why the reputable beliefs that we reject or revise seem true, whether as everyday beliefs, as ‘intuitions,’ as scientific findings, or whatever.

    On its face, this description of what Aristotelian dialectic does is so abstract as to be almost uninformative. But I don’t think it’s empty. It excludes the kind of ‘naturalism’ that gives current scientific theories veto power over what seems to most of us to be ordinary knowledge, but it also excludes the kind of armchair theorizing that allows scientific theories to have no serious impact on our philosophy. And it’s a strength, I think, that Aristotelian dialectic as such does not prescribe any highly determinate method. Our explanations for why certain reputable beliefs that we reject nonetheless seem true to many people can in principle include appeals to systematically distorting habits of thought — the kind of appeals we get from thinkers like Nietzsche and Marx or from naturalists who maintain that ordinary ‘knowledge’ of things like the causal efficacy of our rational agency is in fact illusory — but these explanations will have to stand on firmer ground than the theories that generate them, and meet broader criteria of coherence (coherence with, say, the theorist’s coming to know the truth of that theory — a point at which much Nietzschean, Marxian, and scientistic naturalism seems to me to face some serious problems. Otherwise put, Aristotelian dialectic doesn’t preclude radical revision (as Rawlsian / Nagelian reflective equilibrium often seems to do), but it does not license the subordination of ordinary knowledge and intuition to theory that we in fact find in much radically revisionary philosophy.

    That’s just a vague sketch of an admittedly vague idea, but perhaps it points in a better direction. For what it’s worth, the two best things I’ve read on Aristotelian dialectic recently — both of which are good in part because they try to situate Aristotle’s ‘method’ within contemporary epistemological debates — are Chris Shields’ “The Phainomenological Method in Aristotle’s Metaphysics‘ and Stephen Boulter’s ‘The Aporetic Method and the Defense of Immodest Metaphysics,’ both in Aristotle on Method and Metaphysics, ed. Edward Feser (Palgrave-Macmillan 2013). Taken together, they make what I think is a strong case for the claim that Aristotelian dialectic takes ‘intuitions’ and ordinary beliefs seriously but is not limited to merely making our initial beliefs more coherent.

    Liked by 1 person

    • This is a totally off-topic, merely procedural thing, but are you (y’all) still having trouble editing comments? I find this issue rather baffling, because as the Site Administrator, I can do it with ease: you go into the Comment function from the dashboard, and you can edit. I would have thought that Authors have the same functionality, but maybe not.

      I’m just puzzled what exactly you see from the Author perspective. There’s no outward way for you to tell, but I recently upgraded the site to Premium, which gives me better access to tech support, so if you’re having this problem, email me and I’ll contact them and fix it. I guess it’s hard for you to explain what you see, because you need the contrast with what I’m seeing, which is precisely what none of us have. So in the worst case scenario, I’ll take a look at the dashboard from Carrie-Ann’s computer–she’s a PoT Author and I’m an Administrator (and we live nearby one another), so I’ll be able, at last, to see out What It’s Like to be an Author. But I have to get my Reason Papers editing done before I do that (I’m late as usual), or else I fear that she’ll beat me up.


      • Yep, I see no way to edit comments. I know this is especially pressing since I apparently can’t write a comment without some sort of typo or formatting error.


      • From my administration screen I can view all comments to all posts. To view them, I click “Comments” in the left hand navigation bar. And if the post is one of mine, then I can edit the comment even if I didn’t write it. So I have total control over everyone’s comments and replies to my own posts. But if I’m not the author of the post, then I can’t edit the comment, even if I wrote the comment.

        What would be really useful is a function that produces a list of a given author’s posts, comments, etc., or all the posts with a given keyword. Is that possible?


        • Re your first paragraph: Ah, I see. I wonder if there’s a way to give authors (lower case “a”) the power to edit their own comments.

          Re the second: You can get a list of a given author’s posts (though not their comments) by clicking the author’s name when it appears on one of their posts.


        • So here’s the issue–and I’m sorry I’m hijacking David’s combox for this. I’ve invited all contributors (lower case “c”) to the blog as Authors rather than Editors. Here is the difference between Authors and Editors.


          An Editor can create, edit, publish, and delete any post or page (not just their own), as well as moderate comments and manage categories, tags, and links.

          ↑ Table of Contents ↑


          An Author can create, edit, publish, and delete only their own posts, as well as upload files and images. Authors do not have access to create, modify, or delete pages, nor can they modify posts by other users. Authors can edit comments made on their posts.

          My original thought was that as the site’s owner, I didn’t want to give invitees to the group blog the power to edit items on the site that were either unrelated to the blog, or for that matter, on the blog but not their own. In the worst case scenario, imagine that a flame war starts up over–I don’t know–the problem of universals, the possibility of synthetic a priori knowledge, Israel/Palestine, etc. Then some aggrieved malcontent decides to go on a rampage and sabotage half the site because Khawaja’s blog “sanctions evil,” and if someone sanctions evil, they’re fair game. Having been raised in the Objectivist intellectual milieu, this seemed an entirely plausible scenario: hell, things like that happened among Objectivists with alarming frequency. (Belated trigger warning: unpleasant/ridiculous memories.)

          An unintended consequence of that decision is that Authors can’t edit their own comments when they’re on others’ posts. I guess the best fix is to upgrade Authors to Editors. I’m just uncomfortable with the implication that by doing that, I’m ceding authors the right to edit or alter material that isn’t their own. By default, I have that power as Site Administrator, but I only exercise it on literal, obvious spam. Otherwise, my thought is that all content on the site should have a clear owner, and each owner should individually have exclusive control over his or her content–except in barely-conceivable cases where someone’s content puts someone in literal danger, or legal jeopardy, or somehow jeopardizes the existence of the site itself.


  4. I haven’t forgotten this discussion; I’ll try to get back to it later this week. Meanwhile, I just happened to see this article in the online magazine Edge, on Richard Nisbett’s “crusade against multiple regression analysis.” My knowledge of statistics and social science methodology isn’t good enough to permit me to say anything useful on this, so I just throw it out there FYI, as related to David’s original post. For whatever it’s worth, despite my statistical near-illiteracy, I’ve often wondered about the very thing Nisbett discusses: what exactly are multiple regressions telling us about the causal relations in the real world that we want to understand?


    • I haven’t forgotten either. Actually, there have been some developments in the last few days, and I plan to add a short postscript to my original post soon—tomorrow, I hope.

      Meanwhile, I looked at the Nisbett thing. Rather odd, I thought. It reads like a brain dump. I think maybe the “conversation” consisted of the “interviewer” turning on the recorder and saying, “go.” I don’t know how far you got into it—I got about halfway myself—but the main point, as it develops, is an attack on the notion of a replication crisis and a defense of Business As Usual in psychological science. He’s basically saying, “Don’t worry, people! You can still believe what we psychologists say.” (“Pay no attention to that man behind the curtain!”)

      It’s odd that the piece is titled, “The Crusade Against Multiple Regression Analysis,” since that’s not what it’s about (unless he eventually returns to the topic). He barely mentions multiple regression analysis, actually, and never really explains what he’s got against it. What he explains, and very effectively too, is what can go wrong with observational studies sometimes. Though he overdraws his conclusion. I mean, for example, let’s just remember that observational data was good enough for Kepler! But what has any of this got to do with multiple regression analysis? Multiple regression analysis is a statistical technique, not a method of investigation. It is perfectly well suited to experimental studies, which is typically how it is used! So, whatever his beef is specifically with multiple regression analysis, I’m afraid I can’t say. He certainly doesn’t.

      I believe he starts out with the methodological complaints he does to establish himself as a hardheaded experimentalist who cares about rigorous scientific method. This is so we’ll believe him when he says there’s no replication problem. But what he says about the replication problem indicates to me that he hasn’t really thought about it much and doesn’t really understand the nature of the problem. He dismisses replication failures by saying that the effects in question are small in size and delicate, so we have to expect that they won’t always show up. He gives the Florida effect specifically as an example. But of course, the people trying to replicate the Florida effect know to control for the sorts of factors that could disrupt the effect, if we really understand the effect properly. And the psychological process involved in that effect is supposed to be perfectly general; it is not supposed to depend on a narrow set of experimental conditions. So the mystery of a failed replication has to be taken more seriously than the you-win-a-few-you-lose-a-few attitude Nisbett manifests. Moreover, I’m not sure there have ever been any successful exact replications of the Florida effect, or of the other effects Nisbett mentions. When he speaks of “replications,” he seems to be talking about conceptual replications, not exact replications. But the trouble with that is that conceptual replications can be generated by fishing expeditions and the slippery techniques exposed by Simmons, Nelson, and Simonsohn. They aren’t really replications at all, and when they fail, they aren’t reported. If Nisbett had thought about any of this, I presume he’d comment on it. He doesn’t because—my guess is—he hasn’t.


  5. Pingback: Postscript to “What If Most Research Findings Are False?” | Policy of Truth

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s