How we know in medicine

WHEN LIFE DOESN'T GIVE US LEMONS

When we discover that observations and experiments are part of a problem-solving strategy, a way of getting reliable answers to our questions, we realize that we can apply that same strategy to a huge variety of topics. Getting evidence is a great way to set aside several possible sources of misconceptions: our intuition, traditions, or the feeling that we know how things are. We want to focus on methodology that allows us to better understand actual reality, as a way to distinguish facts from vague ideas, and to separate truth from lies or error.

How do ideas apply to more uncertain, noisy, complex and even active systems, such as those that include us, humans? We will discuss medicine, which has a methodologically scientific basis but is not limited to it, and we will show that, although evidence is fundamental, it is not sufficient. The complexity that appears in medicine can also act as a bridge between the issues pertaining to scientific fields and the more challenging problems that we must solve today and in which, when post-truth arises, it is more difficult for us to find clear answers.

Medicine has already solved many of the same problems that we face in all other areas today and, although it is not infallible nor does it provide all the answers, it has come a long way. How do we know if a treatment is effective? How do we decide which of many effective treatments should be applied to a specific patient? To answer these questions, we turn to evidence obtained through careful observation and experimentation. This is the approach of today's medicine, called evidence-based medicine, which seeks to establish the strength of evidence and the risks and benefits of treatments. In a 1966 British Medical Journal editorial, Canadian physician David Sackett explained the central principles of such medicine, which he defined as "the conscientious, explicit and judicious use of the best available evidence to make decisions about the care of individual patients." The main thrust of this approach is basing decisions on evidence rather than on anecdotes or personal opinions.

In ancient Greece, it was believed that the body had four fluids, or humors (blood, yellow bile, black bile and phlegm),¹Ancient physicians such as Hippocrates and Galen spoke of four types of temperament, which were related to the predominance of each of these four humors: the sanguine (blood), the melancholic (black bile), the choleric (yellow bile) and the phlegmatic (phlegm). Is it not surprising that we still use these words, although their foundation has been refuted? and that diseases were caused by an imbalance among these four humors. It did not seem unreasonable, in that context, to think that diseases could be cured if the patient's blood was extracted by cutting some blood vessels or using leeches to suck their blood. There was no evidence to support this idea, but the traditional approach kept it alive for centuries. Although Hippocrates, the "father of medicine", said that "there are, in fact, two things: science and opinion; the first generates knowledge, the second, ignorance", the reality was that science was not yet known then as it is now.

The path from tradition-based medicine to evidence-based medicine was neither quick nor easy. Every drug that we now know works was approved after undergoing - passing - many rigorous tests, including tests conducted on humans with extreme medical and ethical care. We call these medical experiments on people clinical trials.

The first clinical trial ever recorded was conducted by the surgeon James Lind in 1747. Lind was a young Scot who set out to understand scurvy, a slow and relentless disease that, on long sea voyages, killed more sailors than naval battles or shipwrecks. An estimated 2 million sailors died from scurvy: a painful death in which their teeth loosened at the gums, their skin sank into their cheeks, and their legs became so weak that they could not stand.

Today, we know that this disease is due to vitamin C deficiency. Yes, the same vitamin we already talked about in relation to colds and Pauling. This vitamin is necessary for the synthesis of collagen, an essential protein for skin, tendons, cartilage, muscles and other types of tissues. To be healthy, it is necessary to incorporate vitamins. In particular, vitamin C is found in some fruits and vegetables, especially citrus fruits, and with a balanced diet we can rest assured that we are getting more than we need (the excess is eliminated through urine, so it is not a problem in principle). But in the days of long sea voyages without refrigerators, the diet of sailors consisted almost entirely of salted meat and crackers.

When, in the middle of the ocean, a sailor showed the first symptoms, he worked a little less. As he worked less, he was considered "lazy", in a horrible example of inversion in the interpretation of cause and consequence. Therefore, one of the treatments for these sick sailors was to make them work harder, with the idea that this would counter the symptoms. Of course, this weakened them even more.

As the maritime power of the countries depended on the health of the sailors, many wise Europeans tried for several years to understand the problem. They did not know about vitamins, nor did they know much about medicine. But neither did they know - and this was even more serious - scientific methodology that yields reliable evidence. They acted blindly, and often the cures they proposed were worse than the disease.

This went on for centuries, until James Lind boarded the ship HMS Salisbury, determined to find a cure. Dismissing anecdotes or the traditional way of handling this disease, he picked twelve sailors with scurvy and divided them into six pairs. He took special care to choose twelve sailors with a similar degree of disease progression, and gave them the same meal regimen, but for one difference: the possible treatment he would be testing. To each pair, he gave one of these substances: cider, vitriolic elixir (a solution of sulfuric acid), vinegar, seawater, oranges and lemons, or a mixture of mustard, garlic and radish. He also observed other sick sailors, to whom he gave no treatment. These sailors served as a control group with which to compare whether or not the different treatments had been effective: if those in the control group were not cured and those treated were, for example, the conclusion would have been that the treatments helped the sick; and in the opposite case, the conclusion would have been that they harmed them.

Only six days after starting the treatments, the two sailors who had ingested citrus fruits were cured of scurvy and the others were not. The brilliance of this was not only that he methodically tested different possible treatments, but that he generated different groups to compare the result obtained. If he had not compared sailors who had received different treatments, or had not observed sailors who had not received any treatment, he would not have been able to conclude what the effect of citrus fruits was on the disease.

Lind published these results in 1753, and a few years later, citrus fruits began to be included in the diet of sailors. His experiment showed that citrus cured scurvy, but, more importantly for us in terms of how to find out what the truth is, that experiment was, to our knowledge, the first clinical trial that included control groups. For this reason, Lind is today considered the "father of clinical trials".

Until very recently, there was no systematic consideration of the preferred methodology for ascertaining the efficacy of a potential drug or treatment. Modern medicine is based on scientifically obtained evidence -observations and experiments such as those we described in the previous chapter- but let us make one point clear before continuing: when physicians treat their patients, evidence is not their only input for making decisions. There is a lot of experience, and perhaps also some "expert intuition", like what happens when professional goalkeepers saves penalties: a form of heuristic reasoning based on multiple variables analyzed at the same time appears, and not a supernatural characteristic. Thousands of practice saves have given them valuable knowledge that now operates below the threshold of conscious attention. Just as goalkeepers will not be able to rationally explain what they saw in the player to make the decision to dive left or right, surgeons might decide that patients should be operated on urgently because they senses that something is amiss. In medicine, the personal, the social, the human: the context, in general terms, also plays a role.

Thus, in medical practice, many aspects are not necessarily "rational", probably because the people who give and receive medical treatments are much more than rational machines interacting with the environment. For this reason, the practice of medicine is not a science, but it does feed on it. Behind it all, there is evidence.

THE TRUTH BEHIND HAND WASHING

Although the modern history of evidence-based medicine began around 1940 with the first scientifically rigorous trials designed to determine the efficacy of certain treatments, its roots go back to the 18th and 19th centuries, to the work of a few pioneers such as Lind. Another pioneer was the Hungarian obstetrician Ignaz Philipp Semmelweis, whose discoveries in the maternity ward of the Vienna General Hospital made it possible to control one of the main sources of maternal mortality after childbirth. At least for a while.

In Semmelweis' time, it was not uncommon for women to die during or after childbirth. The most frequent cause of maternal mortality was an uncontrollable disease that broke out shortly after the birth of the baby and often resulted in the death of the newborns as well. Autopsies revealed a generalized deterioration, which was called puerperal fever. In Europe at the time, giving birth was a very real risk for women, even for those who had excellent health at the time of delivery.

The Vienna General Hospital was then among the best in Europe and had a huge maternity ward. Between 1841 and 1846, more than 20,000 deliveries took place there, in which almost 2,000 women died, most of them from puerperal fever. Practically one in ten. To compare and understand how large this number was, in today's world about 200 women die for every 100,000 births, 50 times fewer. If we were in Semmelweis' Vienna, for every 100,000 births there would be 10,000 maternal deaths, not 200.

In 1847, the situation in the Viennese hospital got even worse: the death rate rose from 10% to almost 17%. Every sixth mother died in childbirth. Physicians assumed that puerperal fever was a natural and unpreventable occurrence in childbirth, and resignedly accepted this mortality. But Semmelweiss observed that puerperal fever affected perfectly healthy women on admission to the hospital and searched almost obsessively for its cause in order to establish how to prevent it. He began by imagining possible explanations.

Let us also keep in mind the historical context: it was only about two decades later that the medical community learned, through the work of Louis Pasteur and others, that many diseases were caused by microorganisms, which allowed the germ theory (yes, a scientific theory like evolution) to be formulated.²More on this in Chapter II. In fact, Semmelweis' discoveries are antecedents that led to that theory.

Semmelweis thought of several hypotheses to explain puerperal fever, some of which might seem ridiculous in the light of our present knowledge. For example, one posited that wearing excessively tight clothes at the beginning of pregnancy caused "the fecal matter to be retained in the intestine and its putrid parts to enter the blood"; another questioned mothers giving birth lying on their backs instead of on their sides, and another claimed that they had a bad personal predisposition, which made them get sick and die. Several of the hypotheses that Semmelweis imagined pointed to women's behavior, something possibly related to the fact that all obstetricians were male. Nineteenth century medicine seems very primitive today,³Hopefully our current medicine will look primitive a hundred years from now, in the style of the movie Star Trek, in which Dr. McCoy gives a tablet to a hospital patient who was to undergo dialysis, and the patient shortly thereafter exclaims, but it was the best knowledge available to physicians, whose authority and wisdom was not doubted in those days.

Puerperal fever presented a curious paradox: women who gave birth at home with the help of a midwife - which was quite common - were 60 times less likely to die of puerperal fever than those who gave birth in the hospital. How could it be more dangerous to have a child in one of the best hospitals in Europe, with the best doctors of the day, than on a dirty mattress in a village house and under the care of a midwife? Even the poorest women who came to the hospital with a newborn delivered on the street did not get the infection, while those who had been admitted beforehand almost invariably became ill, especially if they had spent more than 24 hours dilated in the hospital environment.

These observations led Semmelweis to think that something was different at the hospital, and that this factor made women more likely to get puerperal fever there. He decided to analyze the deaths by collecting data and trying to draw conclusions from them. This procedure, with such a quantitative approach, was still rarely applied in medicine at the time and, if we think about it, this time was not so long ago.

In the hospital, there were two wards dedicated to the care of women in labor. When Semmelweis looked closely at the maternal death statistics, something struck him: women in ward 1 were 2.5 times more likely to die than those in ward 2.

What was different between the two wards? Women were assigned to the wards almost randomly, so that could not explain the difference in mortality. However, the first ward was staffed entirely by physicians and their trainees, all male, while the second ward was staffed by midwives and their trainees, all female. But even if it were clear that this difference existed, why would it matter?

Semmelweis was at wits’ end. Devastated, he wrote: "Everything was in doubt, everything seemed inexplicable. Only the enormous number of deaths was a reality." The answer came in the form of a tragic accident: a professor admired by Semmelweis died in 1847 after the scalpel of a student he was guiding during an autopsy cut his finger open. The symptoms and disorders caused by the disease that took his life were identical to those of women with puerperal fever. This aroused Semmelweis' suspicion that something from the corpse that the professor was autopsying had entered his blood and caused the disease. He called the hypothetical causative agent cadaverous particles. Had the women also been getting these particles in their blood? He then reanalyzed the maternal mortality of the two wards and realized that there was something definitely different between the two: the doctors taught and learned anatomy by performing autopsies. The doctors performed autopsies; the midwives did not.

Every dead patient, including women succumbing to puerperal fever, was brought to the autopsy room for teaching purposes. Often, doctors went directly from the autopsy room to attend to women in the delivery room. At best, between the two tasks, they washed their hands with soap (remember that no one knew about the existence of germs at that time). This circumstance led Semmelweis to a new hypothesis: perhaps the doctors were moving cadaveric particles from one place to another.

Semmelweis tested his hypothesis through an experiment in which the variable he modified was that physicians carefully wash their hands and disinfect them with bleach each time they had completed an autopsy and before assisting laboring women. Almost immediately, mortality in Ward 1 dropped to the levels of Ward 2, the one attended by midwives. In the following twelve months, Semmelweis' measures saved the lives of some 300 mothers and 250 babies. In its structure, the Semmelweis experiment was no different from what we imagined in the previous chapter with the remote control and battery replacement, although in this case it was a matter of life and death.

Semmelweis had intuited that physicians were unintentionally causing deaths from puerperal fever because they transferred cadaverous particles (today we would call them infectious microorganisms) from the dead bodies to the parturients. He had an intuition, yes, but he didn't just stick with it, he put it to the test. It was the results that showed him that his intuition was correct.

"None of us knew," Semmelweis later lamented, "that we were the cause of this. Thanks to him and his meticulous work, the tragedy was finally brought under control. Not only that, but with these discoveries, hand washing was born as a preventive measure to avoid disease.

Hand washing, availability of clean water and vaccinations are the preventive public health measures that continue to save the most lives today. Sometimes, I am struck by how easily we forget this, but it is perhaps understandable. After all, the lives saved are not obvious. We notice that someone dies, not that someone in another situation would have died. Let us give Semmelweiss the long overdue recognition he did not get in his time.

If Semmelweis' life were a Hollywood movie, after his struggles and accomplishments, and lives saved, he would have been transformed into a hero destined for happiness and acknowledgment. He was not. The measures taken by Semmelweis were very unpopular, and although his results were solid and the data supported what he said, many physicians refused to accept that hand washing could save lives. A post-truth landmark in the history of medicine: there was information, it was supported by clear evidence known to all, and yet that information failed to change perceptions, ideas and behaviors. ⁴More on this in Chapter XV.

Semmelweis made many enemies, and in 1849 he had to leave the Vienna General Hospital. When he left, maternal mortality went up again. He continued to work in other hospitals, but never returned to his previous professional level. Years later, in 1879, Pasteur established that puerperal fever was caused by a bacterium of the genus Streptococcus. Women who had given birth were infected by streptococci introduced into their bodies through the placental wound.

If doctors did not wash their hands, women died. If they washed their hands, and nothing else changed, seeing that fewer women died, we can conclude that hand washing was the cause of fewer deaths. Knowing that one thing causes another is no small thing, and for this reason, experiments are central as a strategy to find out what the truth is. It is not a methodological whim, an aesthetic element. It is the difference between saving millions of lives and not saving them.

Washing your hands works. It is not irrelevant. And we know it because of this kind of evidence. This is something real, something that we can no longer discuss (or that we can discuss again if someone offers more abundant and solid evidence than we have today, which is a lot). This is, then and in practical terms, true.

We all agree that hand washing is a wonderful thing. That is one solution to a problem, but how do we go about finding the solution for all solutions? The first step is in the strategy of getting empirical evidence, whether observational or experimental, to find out what the truth is. The solution for all solutions begins by applying scientific methodology to questions that go beyond the typically scientific. And Semmelweis, without being aware of this, achieved this for medicine. Thanks to him, and to the work of some of his contemporaries, medicine began to shift from vague intuitions or traditions towards an evidence-based modality. The solution to all solutions begins with trying to find out what the truth is. But that's not enough: we know that post-truth can move forward even when the truth is known, so we must not only discover it, but also defend it.

THE RELIABILITY OF EVIDENCE

Not all evidence is equal. Evidence generates different degrees of certainty: some pieces of evidence are more reliable than others. Although the examples we will give are focused on medicine, the general approach is applicable to other fields. The path taken by medicine in this direction makes it a good case study, and may guide us in other areas that would benefit from greater use of evidence, such as communication, public policy or education: what we are interested in is a methodology that can work in other fields of knowledge.

For starters, how reliable are anecdotes? Not much. Yet, we often take them into account when making decisions. If we buy milk in one store, and then notice that it is past its sell-by date, we will probably not return to that store and go to another one next time around. But we don't really know whether that store is particularly careless about all the products, or whether the new store is better, let alone whether we could justify closing the first one or opening the second one on the basis of an isolated case. If we want to find out whether a potential drug works for something and we test it on a single person, we can never know whether what happens is due to the drug or to some particularity of that person. This is anecdotal evidence, and it is problematic. It is not scientific evidence in the sense that it is made up not of careful observations or careful experiments, but merely of random cases that, for whatever reasons, just happen to stand out. Anecdotal evidence could be true, or it might not. The drug that seems to work for one person might actually work for other people, but maybe not. An anecdote is not a fact. Neither are multiple anecdotes. The plural of anecdote is not data.

An opinion poll does not give us data either. In the face of events that occur in the real world, in the face of facts, no opinion is valid. One can give an opinion about a fact, but the fact itself is not debatable, and this is an important distinction.

This is not to say that opinions or anecdotes are not valuable. On the contrary, they express existing points of view and ideas. We can even use them to generate hypotheses that we can then test with more sophisticated mechanisms. But in factual matters we cannot build valid arguments based on them alone. In any case, they can be a starting point, but not a destination. Crushed willow bark was used as an analgesic for centuries, because there was anecdotal evidence that it worked, and it continued to be used traditionally. When its components were analyzed, it was discovered that it contained a substance that is indeed analgesic: acetylsalicylic acid, which is what aspirin is made from. Edward Jenner invented the first vaccine, against smallpox, after noticing that women who milked cows seemed to be protected against the disease (cows develop a bovine smallpox that does not make people sick, but can awaken in them a defense response that is effective against human smallpox). These are examples of anecdotes, or traditions, that gave rise to hypotheses that were later tested and generated knowledge.

At this point, I can't help noticing that I am selecting anecdotes to illustrate an idea in perhaps a kinder way, or to be able to tell relatable stories. This is another frequent use of anecdotes. Just as before, these anecdotes are not, by themselves, evidence that what I am saying is so. Anecdotes do not usually serve to generate valid arguments, but, at best, to illustrate or exemplify a particular point that was confirmed by more reliable evidence.

The problem arises when we use these anecdotes or opinion polls as if they were data that allow us to make decisions and not with the intention of telling a story or embellishing an idea. If someone refutes our position with specific and reliable evidence, we cannot defend ourselves by saying that many people agree with us. Reality is not a popularity contest.

But let us return to evidence obtained by scientific methodology, such as observations and experiments. Let's focus on another aspect that, until now, we had postponed: how to assess the quality of the evidence, how to know how much to trust it, a priori. And we need to address this because we can protect ourselves from post-truth by looking not only at whether or not there is evidence, but how reliable it is. We thus add one more layer of complexity to the solid evidence base.

Let's look at the widest range of reliability of evidence that could exist. At one extreme, with 0% confidence, say, ideas that are not based on experience. Basically, a lottery in which you can be right or wrong, and we will never know anything. This is not a type of evidence, even if we perceive it as such. At the other extreme, we find absolute truth, with 100% confidence. This truth, even if we assume it exists, is almost never accessible to our empirical methodology and, as we have been arguing, does not relate to the kind of practical truth we want to deal with here. We can say that the Earth is a planet that revolves around the Sun, yes. This is a certainty from the point of view of science, but if we wax purist -and philosophical-, we don't know if we are not all living a common dream in the Matrix. So, in practice, to solve factual questions, we move in a range of certainties that ignores those two extremes of 0% and 100% confidence. We will try to narrow the range somewhat, and deal with weak evidence and stronger, more reliable evidence. And here, things are again useful for our purpose of getting closer to the truth. This is what we, as citizens, must train ourselves in: we must not only ask for evidence that supports the claims of others, but also understand, at least broadly, how reliable it is. With this approach, we do not need to become experts in every discipline. That would be impossible. What we can do is learn to assess the quality of the available evidence.

In the case of medicine, different types of evidence are arranged in a hierarchy according to their degree of reliability. In this hierarchy, we do not rule out anything: both weak and strong evidence is useful, as long as we are aware of the extent to which we can trust it.

To begin with, let us set up an imaginary pyramid, with the least reliable evidence at its base and the most reliable evidence at the top. In this way, we will build a hierarchy applicable to the biomedical field in which we will place only the evidence obtained from human beings, and thus exclude that obtained, for example, with laboratory animals or in vitro systems, which almost always complements or precedes that derived from human beings. As an example, we will analyze lung cancer.

Let's start with clinical cases. This would be the base of our evidence hierarchy pyramid. This is an analysis of what happens to one patient or a small group of patients. It would be the description, for example, of what happens to one or a few people who have lung cancer. It is almost anecdotal evidence. We cannot generalize too much from it, but it may allow us to imagine some hypothesis that could be tested in another situation.

Above clinical cases, we can locate some types of observational studies, i.e., those in which the researchers do not control any variable, but limit themselves to analyzing what happens in reality by collecting data and interpreting them. The simplest of all are prevalence studies or epidemiological studies, in which we observe, for example, the distribution of a disease at a given time: "Within this group of people, in this place and at this time, how many people have lung cancer?". Epidemiological studies do not give us information regarding possible causes. They are snapshots of a situation. Moving up a level in reliability - and complexity - within observational studies, we have case-control studies. These are epidemiological studies with an important peculiarity: a group of patients with a certain condition (the cases) is compared with another group that does not have it (the control group) and, in addition, the studies look back in time (they are retrospective studies) to understand how the two groups differ. They are very useful to identify possible risk factors for something. We can compare a group of people with lung cancer and a group of healthy people, and ask how they are different. It might be striking, for example, that the group with the disease also has a higher percentage of smokers. Other observational studies that give somewhat more reliable results are cohort studies. In this case, a group of people (cohort) is identified and followed over time (these are prospective, forward studies) to see how different exposures affect the outcome. They are often used to look at the effect of putative risk factors that cannot be controlled experimentally. Thus, you can take a group of people, identify who decides to become a smoker and who does not, follow them over time and find out if there are differences in the prevalence of lung cancer between the two groups.

Observations are very useful both for answering questions and for generating new questions about a given problem. As we have seen, the evidence they yield has varying degrees of reliability. If we want to understand whether smoking increases the probability of contracting lung cancer (the question we seek to answer), cohort studies provide more reliable evidence than case-control studies, which provide more information than basic epidemiological studies, and these, in turn, are generally more useful than clinical cases.

There are even higher steps of reliability. We can conduct experiments in which the researcher controls the variables. Always focusing only on biomedical areas, "above" observational studies, there are experiments in humans. Generally speaking, we can consider these more reliable than observations in humans. Of course, in the example of lung cancer we cannot, for ethical reasons, make some people smoke to see if they develop the disease or not, but we can use other strategies. Conducting experiments on humans has its methodological and ethical difficulties, which we will address below.

PLACEBOS AND HUMAN EXPERIMENTS

From Lind's clinical trial to combat scurvy to those being conducted now, the methodology has improved enormously. The first rigorous clinical trials did not begin until the middle of the 20th century, so we have not even been using this approach for a hundred years.

These trials are important today as a final stage to find out, for example, whether a drug works or not. Generally, a drug is first tested in laboratory animals and, if it is effective, it moves on to clinical trials in humans. It is important that, in an experiment, we can compare different things. If we are testing a potential new drug, we should, like Lind, give it to one group of people (the treatment group) and not to another (the control group). But, crucially, the only difference between the groups should be the drug being tested. Thus, if an improvement is observed in the treatment group over the control group, we can attribute that difference to the new drug. This would indicate a causal relationship in which we can state that the new drug does indeed cause an improvement in the patients. Knowing causes is extremely important in medicine (and in so many other fields).

How do we set up, in practice, two groups of people in which the only difference is the experimental drug? First, we must create two equal groups. If there are young people in one group and older people in the other, or women in one group and men in the other, the effects of the drug being tested might be different, but because the people are different, not per se. Now, if one thing is clear, it is that people are not all the same. We are all different, including twin siblings, who are clones, genetically speaking, but are also different in many ways, as anyone who has met twins knows. This problem is solved with statistics, not by carbon copying people. What we need is not two groups of identical clones, but two groups of people who, in statistical terms, are equal.

There are several ways to achieve this. We could, for example, try and “manually” make two statistically equal groups: if one group has a 25-year-old female, we include one in the other; if one group has a 40-year-old male smoker, so will the other, and so on. This approach works quite well from a technical point of view as long as it is meticulous. The first drawback is which variables to take into account. Gender and age seem obvious, but should we also incorporate behavioral or cultural variables? Does it matter if someone is a vegetarian, sedentary, a badminton fanatic, or has attained a college level of education? This may matter more or less depending on the question we are trying to answer. We cannot generalize. But what is clear is that, even if we sort carefully, we will almost certainly be prone to different selection biases. How can we reduce our biases as much as possible?⁵More on this in Chapter VI. First, let's keep in mind that, if we get involved, so do our biases. The problem is not only incorporating biases, but also never knowing which ones and to what extent they affect our results.

In order to get as close as possible to the truth, we have to design and execute procedures that are not so dependent on us, that bring objectivity to our subjective gaze. One of the ways to achieve this is, again, perhaps a bit counter-intuitive, and that is to generate random groups of people, as with a lottery. If we have a sufficiently large number of people, where "sufficiently large" is measured in statistical terms, when we randomly separate people into groups, their differences even out. All of them. Thus, randomizing is a way to cancel out the differences between groups of people, including those differences that we are not even aware of. This idea of randomly generating groups and "treating" one and leaving the other as a "control group" is central to considering an experimental design to be rigorous.

We have thus set up two statistically equal groups of people. We will treat one group with the drug we are evaluating, but what about the other? The obvious thing to do would be to give them nothing, but this is not very helpful. As we said, people are complicated. In the example of vitamin C, we saw how easy it is to believe that something works in medicine without having hard evidence that it really does. When a child gets hurt, parents can hug him, or maybe just put some cold water on the spot where he got hit. That, many times, is enough to make the child feel better. It happens to adults, too. If we get a headache and take a medicine we think is for a headache, we will probably feel better.

A smile or a warm, calm manner also helps to make us feel better. One of the most wonderful aspects of this type of testing - and perhaps one of the least understood - is the fact that sometimes there are substances or procedures that are not really effective, but make us feel better. This effect is known as the placebo effect.

The placebo effect is possibly one of the most studied biases and, perhaps, the one responsible for the fact that many continue to use homeopathy or acupuncture despite solid evidence that these practices are no more effective than a placebo.⁶For those of you who, at this point, are thinking

I tend to keep in mind that, in these matters, intuition or personal experience is not very reliable. I try to set them aside and think about what evidence would convince me of something. But even when I try to be aware, I don't often succeed. Many times, intuition or "works for me" creeps in, and in a very tricky way: If the evidence ends up confirming what I already believed, I say "well, it was obvious", while if it doesn't, I try to think it over, look for more evidence or take longer to trust. These experiments, if well designed and implemented, with randomized groups and control groups, provide much more reliable answers than one's own experience, than worksformeism. That said, how hard it is to accept answers that contradict what we think! It is difficult to understand that we do not know something in the same way before and after obtaining those answers in careful experiments: even if the answer is the same, in one case it is an unconfirmed idea, and in the other, one that has been validated. Even so, there will be times when no evidence will make us change our position. I think that, when that happens, we can at least try to be aware that it is happening to us. Long live introspection.

It is not yet very clear how the placebo effect is produced, but the feeling of improvement is genuine. This is not due - at least not exclusively - to the patient's imagination, but there are also biochemical changes in the brain: neurotransmitters such as dopamine or endorphins are released. Let us emphasize this: we are talking about a feeling of well-being, reported subjectively by the patient himself, and not a real cure in the sense that a biological change is generated in the body. If we have only mild discomfort, a small headache, difficulty sleeping, etc., a placebo may be enough for us. But if we have serious illnesses, such as cancer or even a simple infection in a tooth, it can be extremely dangerous to postpone or discard medicine that we do know is effective in pursuit of treatments that might make us feel slightly better, but in no way affect the development of a tumor or the growth of bacteria.

How do we know whether or not we are dealing with a placebo effect? How do we know if a potential drug has a real effect beyond the placebo? Let us remember that our purpose is to get closer to the truth, and for that we need reliable evidence, not vague impressions.

Since the placebo effect exists, when we want to find out whether a drug works or not, the control group and the treatment group should receive something ostensibly identical, and the only difference should be what is being tested. If we are testing a potential analgesic that is administered in a tablet, then the control group should receive a tablet that is identical (in shape, size, color, taste, mode of administration, etc.) to that administered to the treatment group, but minus the analgesic drug in question. Actually, some studies show that the placebo effect also depends on the environment, the friendliness of the person giving the tablet, etc. If, when comparing both groups, we see an improvement in the treatment group compared to the control group, we can conclude that the drug is effective. If the groups give identical results, then what we see is only placebo effect.

These experiments that have at least two randomly generated groups, one treatment group and one control group, are known as randomized controlled trials (RCT). They are controlled because there is a control group, and they are randomized because the groups were randomly generated.

To make sure that the patient does not guess from the physician's attitude whether he is administering the placebo tablet or the actual drug, we can simply withhold that information from the physician. This is a way of further eliminating the distortion of the results by factors unrelated to what we want to study. To achieve this, coding systems are used so that neither the physician nor the patient knows whether a tablet has the drug or the placebo, but those who then analyze the data, without having any contact with the physician or the patient during the test, do know. This is called double-blinding (it is double because it refers to both patients and physicians). There is even a triple blind, in which the results are analyzed without identifying which set of data corresponds to which group.

Beyond what medicine does about the placebo effect, I find this interesting at another level. The placebo effect shows that we are complex and that intuition can deceive us. It is important for us to genuinely accept this. So, in the fight against post-truth, if we believe something, but the evidence contradicts it, we can try to be aware of what is happening to us and change our position. Recognizing ourselves as victims of something as small as the placebo effect can help us gain the perspective we need to better resist the effects of post-truth.

When Lind conducted his controlled clinical trial testing different treatments for scurvy, he had not randomized the sailors and it was not a double-blind trial, but without those first steps we would not be, methodologically speaking, where we are today. So thank you, James.

In the field of biomedical research, RCTs are considered the most reliable type of evidence. Not only are they very meticulous and very useful for eliminating biases, but they also give us a type of information that is very valuable for understanding reality: they allow us, with great confidence, to establish a causal relationship.

We have discussed the methodological issues of clinical trials. What about the ethical aspects? Today, they are quite regulated and controlled. Before a clinical trial is conducted, the experimental design must be approved, not by those who will carry it out (they could inadvertently introduce bias, or simply have a conflict of interest), but by an external committee. Patients who take part in the trial must be adequately informed of the possible risks, and must sign a consent form. If there is already an effective treatment for a specific disease or condition, the control group, instead of receiving the placebo, should receive that treatment, for ethical reasons and because what doctors will focus on later is not so much whether the new experimental drug works or not, but whether it works better than what is already available.

For these reasons, if we have a controlled, randomized, triple-blind trial, we are in the crème de la crème of human research.

But of course, this is a single clinical trial. To be more certain that the result is correct, it should be replicated by other investigators and with other patients. If there are many randomized, double- or triple-blind clinical trials allowing us to conclude that a given drug is effective and safe, then the certainty increases. Note that we are still not talking about absolutes, but we are moving exclusively on a linear axis of greater or lesser certainty. Although a single RCT has very high reliability, if many RCTs that study the same thing give similar results, then this reliability is even greater.

Now, this approach of searching for observational or experimental evidence can also fail. High reliability and powerful evidence do not equal infallibility. Even without engaging in fraud or incompetence, the reasons for failures can be methodological, i.e., they can occur in designing the studies, collecting the data, or interpreting the results. Sometimes, an experiment is repeated and does not give the same results. Something that works in rats may not work in humans. The evidence for something may be incomplete or obtained with imperfect methodology. We should not despair or think that we will never know anything. Let us be willing to accept our mistakes and try to correct them. Knowing is better than not knowing.

Now, what if some studies give a result, but other studies contradict them? What you can do is to take all the available evidence on a topic and start analyzing it from a statistical point of view and taking into account how reliable each one is. Does one piece of evidence come from an observational study or an experimental one? Did the RCT control group have a placebo or nothing? Was it a double-blind study or not? How large were the groups of people? This allows us to make a kind of "summary" of what the available evidence says so far, which has greater statistical power than the individual pieces of evidence. These analyses that encompass the evidence available so far are known as systematic reviews or meta-analyses, and are considered to be the most reliable point in this hierarchy of reliability of evidence in the medical field.

The prefix meta is often used to indicate something that is one level "above" that same category. Just as metacognition is cognition about cognition, a meta-analysis is analysis of analyses. It is doing science about science. The "meta world" is beautiful.

Meta-analyses do not incorporate new information, but analyze existing information. Currently, the best known network of researchers specifically dedicated to doing this in medicine is the Cochrane Collaboration, which generates systematic reviews available to anyone interested in them.

We have discussed broadly what types of evidence we can have and how reliable, in principle, the information they provide is. We will soon see that not everything is so clear-cut and that sometimes there are problems.

ALL THAT GLITTERS IS NOT GOLD

Before evidence-based medicine, the only thing that guided health professionals was their clinical experience, which they transmitted to their students and disciples.

However, as we have seen, in real life not everything is so neat or linear. Medicine is a good example of how to tackle even more complex problems, how to design really effective public policies, because it has lost and then found its way several times. So we will stick with medicine, but always looking to see how it can help us identify similar problems and potential solutions in other fields.

Some difficulties that arise in medicine have to do with the fact that the hierarchy of evidence we mentioned earlier cannot always be followed to the letter. A well implemented and interpreted cohort study can be more informative, more useful, than a poorly conducted RCT. Sometimes it does not even make sense, or it is not feasible, to perform an RCT. Meta-analyses themselves can have problems; for example, if an effect is real but not too large, it can get "lost" in an analysis that takes into account evidence coming from many different sides. Besides the evidence itself, medicine also faces difficulties that have to do with how it is actually practiced, or with the influence of stakeholders who benefit from one decision or another.

As far as the evidence goes, let's take an example. In 2003, an ironic paper was published in the journal The BMJ that, beyond its jocular tone, was widely read and cited.

It was read and quoted, and I, by mentioning it here, am contributing to that. A sign that what is most widely disseminated is not necessarily the most relevant, but -as in this case- what, for some reason, gets our attention.

The paper was entitled "Use of parachutes to prevent death and serious injury related to a gravitational challenge: a systematic review of randomized controlled trials."⁷See Smith, G. C. S. and Pell, J. P. (2003). As a result, they said, "We were unable to identify any RCTs of parachute use," from which they concluded that "parachutes were not subjected to rigorous evaluation by RCTs." The authors added this in their "conclusion": "Proponents of evidence-based medicine criticized the adoption of interventions that were only evaluated by observational data. We believe that all would benefit if the most enthusiastic champions of evidence-based medicine were to organize and participate in a double-blind, randomized, placebo-controlled parachute trial."

This "work" is a sample of something very real: we cannot always conduct RCTs, so sometimes the best we can have is good observational data. However, this does not imply that nothing makes any difference and we should not bother demanding more reliable evidence.

Finding the problems with what’s available to us helps us improve it. Most medical practices are not like parachutes, and do require RCT, argue the authors of a 2018 scientific paper in response to the one above. The title of the paper, published in CMAJ Open, is "Most medical practices are not a parachute: a citation analysis of practices that the authors consider parachute analogs."⁸See Hayes, M. J. et al. (2018).

RCTs are still considered the gold standard in clinical research, but understanding their limitations can help avoid generating undue expectations. The fact that there is an RCT on a topic does not invalidate the observational evidence that may exist, nor does it make it unimportant to take it into account. Everything can be considered together. In fact, if the results of both approaches - the observational and the experimental - are consistent with each other, the confidence that we have the right answer to the question is enhanced. Therefore, that confidence is, in a sense, "more than the sum of its parts." If an RCT does not agree with what has been observed by other methods, the issue should be investigated further. It is not that observational evidence is discarded because it is considered to be of "poorer" quality. For this reason, in addition to producing evidence that is as reliable as possible, evidence from very different methodological approaches is generally sought.

This is central in medicine, and also in our daily lives and in other areas where evidence is sometimes invoked. Whenever we are told that "a paper was published showing that...", or that "researchers at the University of.... showed that...", we should ask ourselves, at the very least, how reliable that evidence is and what other evidence there is about the subject.

There are many situations in medicine where we cannot, or should not, conduct an RCT. For example, if we suspect that a substance might be carcinogenic because it was proved in animal experiments, we would not perform an RCT in humans to see if the treatment group gets more cancer than the control group. In the case of extremely rare diseases or conditions, with tens or at most hundreds of affected people in the world, RCTs are impractical (and uninformative). Especially, because the number of people in either group would be too small. For an RCT to yield moderately reliable results, the groups must be quite large, and this is even more relevant if the difference between the groups is very small. For very rare diseases, a good observational study may be preferable to an RCT.

Therefore, the hierarchy of evidence we have been discussing is valid, but we cannot follow it blindly without taking into account this type of contextual issues. Much, much, much less, can we consider it applicable, without any reflection, to other fields of knowledge beyond medicine. Let us remember that there are entire areas of science in which we cannot perform RCTs (climate, astrophysics, evolution), and this does not mean that these are less "scientific" disciplines.

Evidence-based medicine allows us not only to know what works and what doesn't, but to understand relative risks, probabilities, or whether the potential benefit of something is greater than the risks involved.

It is not a perfect system by any means. Very often, decisions are made based on evidence that is not strong enough. Other times, outside interests, from pharmaceutical companies, insurance companies, etc., influence the decision-making. The scientist John Ioannidis is more extreme, and considers that "clinical medicine has been transformed into medicine based on finance".

Of course, true evidence-based medicine takes into account whether or not there is sufficient evidence of quality and ensures that there are no influences from interested groups. It is essential for evidence-based medicine to not be just an empty label intended to lend credibility, but also to reflect that adequate quality standards were applied to the process. As we demand for this to actually happen, let us not forget that the alternatives are even less reliable.

EVIDENCE-BASED MEDICINE

In this chapter we show how the evidence, which we had presented in the previous chapter, impacts medicine. We did that, mainly, for three reasons.

One: to illustrate that there are areas in which evidence is essential as a foundation. Today, medicine is nourished by evidence, but it is not limited to it due to several factors. First, because although the evidence may not always be clear and complete, it is not possible to postpone a decision until it is. Therefore, in evidence-based medicine, decisions are based on the best available evidence. Furthermore, because beyond the evidence, the physician's experience, his expert intuition and traditions play a role. The practice of medicine is a mixture of science and art, where we must demand better and better quality evidence available and accept that it will never be enough to exclude the physician and the patient from the equation.

Two: because post-truth sometimes challenges medical issues; some groups believe that vaccines are dangerous, adhere to "alternative" medical treatments that are not known to be effective or are known to be ineffective, or even refuse to accept that some viruses or bacteria cause disease. This will be taken up later in the book, but let us say here that post-truth in medicine puts those who believe these ideas at risk, but it may also be dangerous to the rest of society, for example, by allowing the spread of disease.

I know that this is a delicate issue and that, if someone feels alluded to by these words, he or she must think that I do not know what I am talking about, that I cannot say it this way, recklessly and groundlessly, or that I was deceived or manipulated. I ask that person for a little patience: later on I will try to unpack the post-truth mechanisms at work in these cases.

Three: because talking about medicine is a good intermediate example between what is typical of scientific fields - in which the mechanisms for generating, validating and accepting evidence are protected and accepted by the community - and what happens in the "real world", where problems are more complex and involve not only evidence, but also conflicts related to our individual and social behavior. Many of these complex problems end up distorted by post-truth, but we cannot address them without making the role of evidence a little clearer.

The historical view of medicine can be useful for other issues in which the incorporation of evidence in decision-making has been slower. For example, when a State decides what to do with respect to health, education, security, etc. -what is known collectively as public policy-, it is often based on tradition, intuition, anecdotal evidence or even responds to the influence of interest groups. Public policies are supposedly designed to improve the lives of citizens, but without incorporating the evidence of whether they work or not, it is difficult to assess whether they are really effective in achieving that purpose. Public policy is not unlike medicine in the 18th century: a few small attempts here and there to find out if what is being done actually works, lost in the midst of decisions being made in other ways.

How can we assess whether a medical claim is supported by evidence? What should we focus on? The time has come to introduce the second Pocket Survival Guide, which seeks to guide us in what questions we might ask ourselves to know whether or not a medical claim can be trusted.

Thus, we add to our toolbox a new set of questions suitable for any medical topic, regardless of our expertise in that particular subject. The tools in our Survival Guides are not meant to teach us more about each field, but rather to be aware of the processes and validation mechanisms behind them.

As we said, this approach can be applied to any factual issue, and what happens to citizens is factual. We'll get to that and include the complexity that comes with post-truth. But, for now, we add these new tools, related to medicine, to our toolbox and move on to address the problem of evidence consensus and the uncertainty that surrounds it.

POCKET SURVIVAL GUIDE #2
HOW TO DECIDE WHETHER OR NOT
TO TRUST A MEDICAL STATEMENT?
Is the medical claim supported by evidence (observational and/or experimental)?
Could other factors such as traditions or anecdotal evidence be playing a role?
According to the hierarchy of evidence, is the evidence of "good quality" (e.g., clinical trials or meta-analyses)?
Do the various pieces of evidence broadly agree with each other?
Does the statement agree with the "good quality" evidence or contradict it?
If the evidence is not conclusive in either direction, could the medical decision be influenced by other opinions or interests? Which are those? Do we share them?