Some Problems Are Intrinsically Difficult

Chapter 22. Some Problems Are Intrinsically Difficult

The scientific method leads to faster progress on some problems than others. Like children's puzzles and games, scientific problems vary in tractability from easy to difficult. Tic-tac-toe is easy to master. Jig saw puzzles require more patience, but with persistence can be solved by nearly anyone. Rubik's cube is mind-boggling.

In the last half century we have built bombs that could catastrophically alter our climate for centuries to come, yet we remain surprisingly inept at predicting the weather even a week in advance. Although we have eliminated smallpox, progress on curing cancer has been painfully slow. We can describe with some accuracy the flight of a baseball, but are inept at predicting the outcome of sporting events.

There are good reasons why ignorance persists about the weather, sports contests, and many other phenomena in spite of continual assault by the scientific approach. The very nature of these problems makes them difficult. Of course the rate of progress is in part controlled by factors other than the difficulty of the problem, including the amount of money spent, and the number of people working on the problem. But these social factors will not concern us. Here we will consider five factors that make some scientific questions intrinsically difficult: (1) time lags, (2) rarity, (3) interactions, and (4) the difficulties of using human subjects in an experiment.

1. Time lags slow progress

For some problems, the data needed to test models can only be gathered slowly. Consider the procedure we use in adjusting the temperature of a shower and the dial of a radio. Both involve a simple application of the scientific method: We start at some initial setting, evaluate the setting, readjust the setting, evaluate the new setting, and repeat the process until the desired goal is achieved.

Because we obtain the needed data faster, progress in setting the radio dial occurs more quickly than progress on adjusting the shower temperature. The difference is due to time lags. Shower handles are typically several feet removed from the shower head, and it may take up to a minute before an adjustment at the handle translates into a change in the temperature of water on our skin. By contrast, turning the radio dial results in a nearly instantaneous change in the radio's output. Because gathering data about shower temperature involves a longer time lag than gathering data about a radio frequency, it takes longer to adjust a shower.

Time lags abound in our world. Some spectacular and renowned cases involve the orbits of heavenly bodies. Those of us who did not observe Haley's comet this century are not likely to have another chance, because the time lag is 77 years. No one knew to expect the Hale-Bopp comet in 1997 because it had not been observed for nearly 2000 years. We all fear overexposure to radiation because it increases the risk of cancer. However, the onset of cancer typically follows exposure to radiation by many years (5 years for leukemia and 20 years for other cancers). In economics, the Federal Reserve Bank Board (the "Fed") attempts to influence the U.S. economy by adjusting interest rates; the effect of a change in interest rates takes months to be translated into an impact on the economy. And couples wishing to become parents must wait at least 9 months if they do the job by the classical method, and often much longer if they wish to adopt.

Not only do time lags increase the cycle time of the scientific method, but they also are sometimes so long as to escape detection altogether. The first point is evident from our shower analogy. The longer the delay between faucet adjustment and temperature change, the longer it takes to find an acceptable setting. It may take the same number of adjustments to adjust both the shower and radio dial, but it simply takes more total time when the time lag is long.

The second point --- the possible failure to recognize a time lag --- is more subtle and more sinister. If a time lag is extraordinarily long, we may be unable even to determine that it is present. For example, if the cancer rates we experience today are determined by chemical exposures to our grandfathers when they were 10, it would be nearly impossible to discover the effect.

Table 22.1 Examples of time lags

Long time lags

Greenhouse warming. There is a lag of decades between industrial activities that increase atmospheric carbon dioxide, and any resulting change in climate.

Weight loss diets. Only after weeks or months on a diet do you achieve significant weight loss.

Diet and heart disease. Many people eat a high fat diet for decades before suffering a heart attack.

Short time lags

Computers. You can obtain data from many computers almost instantaneously; that is, with essentially no time lag. The computer responds almost instantly when you type in a command.

Steering a ship. There is an obvious time lag between turning the steering wheel of an ocean liner and the actual change in direction of the ship. Pilots also face time lags in landings and take-offs because the momentum of the plane does not change quickly in response to the controls.

Social Problems Created by Long Time Lags

Consider the difficulties posed by time lags in drug manufacturing. If you are testing a new drug, how long should the participants in the study be followed before you can conclude that the drug is safe? Is one year sufficient? Five years? It strains the limits of credulity to imagine how our drug-based health treatment would be affected if trials needed to be followed even for 10 years before a product could be approved. New companies would have to find sources of revenue for at least 10 years before they could begin marketing their first product. The shelves are full of drugs that would not be available under such rules, and of course, improvements in those drugs would be even longer in coming.

The U.S. drug marketplace has witnessed such a problem. From the late 1940's until 1970, the drug DES (diethelystilbestrol) was administered to many women in early pregnancy to suppress miscarriage. It was only later discovered that its use results in an increased cancer rate in their offspring --- 20 to 40 years after exposure to the drug. If the drug had caused cancer immediately (e.g., in the pregnant women or their newborn), then it would have been contraindicated long before 1970, and many fewer people would have developed DES-caused cancer. Similarly, a 1993 trial of a hepatitis B drug killed 5 of the 15 volunteers. Part of the reason so many died was that the lethal effect of the drug was somewhat delayed --- a time lag had not been anticipated.

There is no definitive solution to this dilemma. Countless drugs that we are taking now could be having a delayed effect. Some compromise must be struck between the conflicting goals of adequate testing to ensure safety and maximizing the number of effective drugs available. Actions that the government takes to increase the safety of drugs available (by requiring tests run for longer periods of time), will often prevent or delay some safe and effective drugs from coming to market, because the drugs can't be sold while experiments determining their safety are undertaken. Moreover, regardless of the number of tests undertaken, there is no way to be absolutely certain that a given drug is safe.

Another set of problems arises because long time lags make it difficult to determine who is to blame for poor performance. Is the current recession a consequence of the policies of the current president, or a predecessor? Are the company's earnings the first year after a new CEO is hired a consequence of his/her actions, or the actions of a predecessor? In both these cases, uncertainty about the duration of a time lag obscures the answer.

Avoiding Time Lags

Time lags are common problems, and scientists have discovered ways to lessen their impact. A common approach is to study alternative models that incorporate a shorter time lag (see Table 22.2). The utility of an alternative model with short time lags depends on its similarity to the main model in question. Viruses (bacteriophages) and fruitflies yielded major insights to the study of human genetics because they have a vastly shorter generation time than do humans.

Perhaps the biggest difficulty is posed by unexpected time lags, as with DES-caused cancer. If you aren't expecting a time lag, there's not much you can do about it until you stumble on it.

Table 22.2. Models that reduce time lags.

Genetics of viruses. Their short life cycle led to rapid understanding of genetic principles that apply to nearly all life.

Rodents are used in cancer research because their short life, relative to humans, enables testing for otherwise long-term effects

Flow charts enable coordination of different dimensions of complicated construction and other social projects, so that excessive delays are avoided.

Political polls provide politicians with rapid feedback about public perception of their performance. The politician can change their positions and their behavior in response to the poll, so they don't have to wait for an election to discover their popularity.

Sneak previews enable marketing agencies to anticipate public reaction to a product before it is made widely available. Changes in packaging and marketing strategy can occur much more quickly if they only affect a small market. Once the bugs are ironed out in a small market, the resulting marketing strategy is then used nationally.

Early reviews and advance advertising. A company may speed public awareness of a product prior to or coincident with its availability in the marketplace.

2. Rare events are difficult to measure

We have all experienced the frustration of a car, stereo, or other complicated machine failing us, only to face the embarrassment of the machine working perfectly when brought in for repair. It is usually easier to fix something that consistently fails than to fix an intermittent problem. A common solution is to simply ignore an intermittent problem until it worsens.

The difficulty of a scientific problem depends heavily on the frequency of the event being studied. Models of rare events improve only slowly. Inconsistent or uncertain results increase the number of observations that must be made -- the number of samples that must be taken -- before we can make progress. For example, it does not require too many coin tosses to realize that we are being cheated with a 2-headed coin. But to detect whether a casino's slot machine offered "fair odds" of a win, we might need to pull the lever thousands or hundreds of thousands of times. Thus, when the event we seek is extremely rare, the problem can become physically insurmountable.

There are many kinds of rare events that confront us, many of them undesirable (Table 22.3). Any one of these events is rare enough that we are likely to ignore its possibility, but there are so many rare event possibilities that they pose a threat collectively. And from a social policy perspective, an individually rare event can still mean thousands of cases in a population the size of the U.S.

Table 22.3 Rare events in our personal lives

Adverse reactions to common drugs and vaccines

Side effects of food additives

Transportation accidents

Equipment failures on airplanes and space shuttles

Cardiac arrest under anesthesia

Large liability awards against insurance companies

Floods, tornadoes, lightning, and hurricanes

Leukemia

Winning the lottery

Measuring a Rare Event can involve Enormous Sample Sizes

Childhood leukemia is one of the few cancers that occurs at appreciable levels in children. The disease is fatal unless halted with an extremely radical and difficult treatment. The odds in the U.S. are that about 1 in every 20,000 children will develop leukemia before becoming an adult. This number is a baseline, or average rate. We would obviously like to reduce the number of cases below 1 in 20,000, but we also want to ensure against environmental changes that increase it.

Studies over the last 15 years have suggested that the childhood leukemia rate may nearly double due to exposure to intense electromagnetic fields --- the sort of everyday radiation emitted from electric appliances, power lines, and transformers atop telephone poles. Even though a doubling of this rate still means that each individual has an excellent chance of avoiding leukemia, the doubling would constitute a serious increase in the number of childhood leukemia cases in a country the size of the U.S.

With a rate of 1 in 20,000, we expect only 5 cases in 100,000, or 10 cases if the rate is doubled. Yet, if we indeed observed 5 cases out of 100,000 for one group and 10 out of 100,000 for another group, the difference between 5 and 10 is not large enough to convince us that pure chance isn't responsible for the discrepancy. Even larger numbers of individuals would need to be sampled. Herein lies the problem: a sample of 200,000 children is not adequate for detecting even a doubling of the leukemia rate. When considering that a variety of data must be collected on each child, the enormity and cost of the problem becomes staggering.

A Dilemma for Business

Although we have illustrated how the difficulty in measuring rare events can harm ordinary citizens, these same problems also impact business in pursuit of their goals. Suppose that a product is tested with 1000 subjects and found to be satisfactory and safe. If it is hazardous to 1 out of 10,000 people, then even this extensive study is likely to miss the hazardous effect. Yet when the product is marketed, it will come in contact with possibly millions of people, and its drawbacks will become obvious from the hundreds of people who suffer from it. Liability costs for even a few of those afflicted could easily wipe out all profits. This problem applies to manufacturers of drugs and food additives, obviously enough, but also to manufacturers of fabrics, household chemicals, equipment, toys, and an innumerable list of other items with which physical accidents may occur.

Sometimes it’s Impossible to Obtain an Adequate Sample

The rapid urbanization of the last century notwithstanding, much of the U.S. is populated by small communities of a few hundred to a few thousand people. Importantly, an environmental hazard may increase the incidence of cancer, birth defect, or miscarriage yet the entire community may be so small that there is no statistical basis for demonstrating an ill effect of the hazard.

Furthermore, a corporation exposing a small village to a toxic chemical, for example, may be virtually immune from legal accountability (provided that people are not killed or hospitalized en mass), because too few cases will ever come to pass. Disputes between small communities and large corporations spraying herbicides have in fact occurred over this very point. Similar debates have arisen over whether the emissions from chemical manufacturers have increased the number of cases of anencephaly (babies born with essentially no brain) in small communities along the Texas-Mexico border.

Even if a suspected hazard such as a toxic waste site or gasoline tank farm occurs in a large city, there is no guarantee that enough people will be affected to produce convincing scientific evidence that the suspected hazard really is bad. If the toxic waste site only increases cancer rates in residents who live within several blocks of the site, then it is likely that only a very small number will contract cancer because of the toxic waste site. Even though the toxic waste site is in a large city, the scientific issues are very similar to those encountered in understanding environmental hazards in small communities.

Related problems: dispersed impacts and events that aren’t replicated

There are some obvious generalizations and extensions of rare event problems. One is dispersed effects: a large number of people are affected, but they are not clustered in any obvious way. Dispersion is a common problem in the detection of infectious diseases and is an acknowledged problem in bioterrorism awareness. For any one type of food item, there are relatively few food processing centers in the country. For example, most lettuce used in restaurants and fast-food chains is chopped up in a few sites. Suppose one of those sites was contaminated with an infectious bacterium that caused 400 consumers to get sick: the effect would be a distributed outbreak of the illness, but only 2-4 per major city. If the sickness was nothing out of the ordinary (e.g., diarrhea, with recovery in 3 days), the contamination would go undetected. If all 400 illnesses happened in one city, it might well be detected. Indeed, the clustering of illnesses was critical to the detection of an E. coli O157 outbreak in Seattle a few years ago (known as the “Jack-in-theBox” episode) – there had been a similar outbreak in Nevada years earlier that had gone unnoticed. Likewise, the discovery of hantavirus infections in the U.S. was accidental, and found only because of a geographic cluster of illnesses in the four-corners area.

A second difficulty in applying the scientific method is that some events cannot be replicated. Historical events are the most obvious, and some controversy and angst in our society revolves around past events that weren’t sufficiently documented and don’t have satisfactory answers (Kennedy assassination, supposed aliens near Roswell). Some types of large-scale events have the same problem – they can only happen once, because the whole population is affected. (The mass polio vaccination with the live Sabin vaccine around 1960 comes to mind, but this problem affects the implementation of many government programs. Likewise, the HIV epidemic is unique for the size of its impact on the world.) In many large-scale events, there will be components that are replicated (e.g., in the HIV epidemic, infections are replicated millions of times) but will also be components that are unique (e.g., world-wide economic impacts of the massive toll).

Overcoming Rarity

If a large sample can't be obtained, there are several alternatives that enable us to side-step the problem posed by rare events. The general solution is to turn to alternative models that facilitate the observation of large numbers of cases.

Models of surrogates. Although cancer is a common affliction of humans, the development of specific cancers in response to specific factors is rare (e.g., the leukemia risk from increased levels of radiation is not very high). To assess cancer risk, some studies instead look at abnormalities other than cancer, such as chromosome aberrations in blood cells or precancerous growths, and other studies assess mutation rates in bacteria, which can be analyzed by the billions. These cancer surrogates are chosen because they are thought to accurately reflect the likelihood of developing cancer and because they occur at higher frequencies than the cancers themselves. In the same vein, one could use the near-collisions of aircraft to study the factors influencing actual collisions, which are themselves exceedingly rare.

Inflating the rates. The world is heterogeneous, and when science studies a rare phenomenon, there may be special circumstances in which the phenomenon is common or can be rendered common. To test equipment failure, it is often a simple matter to stress equipment under laboratory conditions to increase its rate of failure, thereby obtaining information about its failure under more normal circumstances. In medicine, rats are often subjected to extremely high doses of substances, to increase the frequency of any ill effects that might be felt by a tiny minority of human consumers. And medical models with inflated rates are not always rats. People who for whatever reason receive higher-than-normal doses of radiation, alcohol, and other drugs are sometimes studied specifically for the purpose of determining risks of lower doses. Airline cockpit simulators can mimic unusual combinations of events, thereby increasing a pilot's ability to survive adverse conditions.

Tracing causal chains. We have implicitly assumed in this chapter that in order to demonstrate that, say, a toxic waste dump causes cancer in nearby residents, you must establish a correlation, or association, between the waste dump and cancer. This assumption is not completely valid. If we can understand why a rare event is occurring, we may be able to draw reliable conclusions with even small sample sizes. In the beginning of the chapter, we pointed out that it would likely take many thousands of pulls of a slot machine lever to determine the odds of winning. But there is a more direct approach: Simply open the machine up, and look at how the odds have been set (However, we don't recommend trying this on a casino floor.) As we discussed in the chapter on correlation and causation, many problems can be attacked similarly. Examples include scientists identifying the particular genes that an environmental hazard causes to mutate, and secret service agents looking at the details of past presidential assassinations to understand the psychological profile of assassins and the circumstances in which a threat is likely to develop.

3. Pitfalls of complexity

The arch-villain Joker in the 1989 movie Batman devised a plan to poison the citizens of Gotham City. Rather than simply put a single poison into one product, Joker used a poison which required the combined effects of multiple ingredients. No single product was by itself toxic. Batman discovered the formula to Joker's toxic scheme, and the public was advised accordingly: "avoid the following combinations: deodorants with baby powder, hair spray, and lipstick."

The sinister dimension to Joker's plan is readily apparent to us, because we can all appreciate how difficult it would have been to discover that a combination of products was deadly. Several years ago, when a real villain was lacing bottles of Tylenol with cyanide in the U.S., the problem was simple enough to trace, because a single product was the source of the poison. But imagine the difficulty of tracing the problem if a combination of three products was toxic, and that each of these products by itself was innocuous. Likely, many people would die before anyone determined that a particular combination of products was fatal.

The phenomenon that underlies this example is an interaction among many factors: we cannot discern the whole from a sum of the parts. This is a problem because science typically functions in the same way that we construct a jigsaw puzzle. That is, although the problem involves many pieces and is overwhelmingly complex, progress is made one piece at a time, building on previous successes. Most improved models are relatively minor modifications of their predecessors. But suppose that the puzzle consisted of many pieces, each of which could fit with several other pieces, yet only one combination enabled all pieces to fit together. In this case, we could make many starts, only to find that they invariably led nowhere.

Interactions are ubiquitous in our lives at one level or another (Table 22.4). Many events from the non-scientific and non-industrial side of our lives involve interactions at one level or another: a joke without its punchline is not half funny.

Table 22.4 Interactions of common experience

Example	Ingredients	Result	Basis of interaction
flash powder	mixture of magnesium powder and potassium nitrate, plus energy	explosion	neither ingredient alone generates a reaction
lethal gas	mixing household bleach and ammonia cleansers	chlorine gas	each cleanser is safe when used alone
atomic bomb	critical mass of plutonium or uranium	chain reaction of atomic disintegrations	Half of a critical mass does not release half the energy of a critical mass
cooking recipes	various spices and food items	prepared meals	eating the prepared meal has a greater appeal than eating each ingredient separately
drug complications	different drugs designed for different purposes	drug-induced death or illness	when used separately, drugs produce positive health effects

Why Interactions Are a Problem

The problem posed by interactions is due to an inability to extrapolate from one model to a new one. For someone cleaning around the house, it seems perfectly logical to mix different cleansers to reduce the number of times a surface needs to be cleaned. Indeed, many household products and over-the-counter drugs actively advertise a multiplicity of components --- the all-in-one principle. But occasionally, the combination of two or more safe ingredients holds a surprise, such as deadly chlorine gas.

The extension of this principle from ordinary problems to scientific ones is simple, as is a realization of the difficulty it poses. Time and again, science fails to give us advance warning of dangerous interactions, and people are injured or die before we are able to arrive at an adequate model to explain the phenomenon. For example, the deadly combination afforded by sedatives and alcohol was discovered by trial and error. The death of a few celebrities in the 1950's and 1960's made this interaction well known. The history of new discoveries in applied chemistry is replete with examples of botched protocols that led to completely unexpected results.

Avoiding the Pitfall

To a large extent, science is simply saddled with this problem. The recent Noble Prize awarded for the discovery of bizarre combinations of metals that have superconducting properties reflects the difficulty of such problems. Two kinds of approaches help overcome this general problem, but neither is a completely satisfactory solution: models of mechanisms (or, equivalently, causal chains), and models of single components.

Tracing causal chains. This is exactly the same principle we have already discussed in reference to rare events, and in the chapters on causation and correlation. The atomic bomb, for example, was not discovered by accident, rather it was predicted from knowledge about radioactive disintegration products and energies for specific Uranium and Plutonium isotopes. In this case, an explosive chain reaction results when the sum of many individual fissions reaches a critical threshold.

Models of single components. In many other cases, complex interactions can be anticipated by first looking at one or more of the ingredients separately. The driving force in gunpowder and flash powder is an oxidizing chemical. Although explosions will result from specific combinations of ingredients, the oxidizing agent is capable of sustaining combustion with a much wider range of ingredients, so it becomes a simple matter to explore different combinations to optimize the rate of combustion.

4. Humans make difficult experimental subjects

There are many problems facing humans which could be ameliorated using the scientific method except that the "ideal" experiments cannot be conducted because they involve humans -- they are unethical, too expensive, or just impractical. Consider how you would react if you discovered that the government or your employer had exposed you to high doses of radiation without your knowledge, or had tested drugs on you without your approval. These kinds of manipulations are routinely performed on non-human organisms, but we do not permit them to be conducted on ourselves, even when such manipulations could be most useful in solving an important problem.

Second, some manipulations with humans that are not unethical are nonetheless not feasible. Studies requiring humans to voluntarily change their behavior (for example by adopting a particular diet) pose the obvious problem that the subjects may not comply with the regimen. If the manipulation calls for an extreme change in behavior for a long period, the experiment is probably not feasible.

Experimental studies of human behavior constitute a gray area in terms of ethics. Most of us would likely frown on an experiment that involved teaching children to fear common, harmless objects. And many people would object to being "experimented upon" without being informed of this. Yet that is essentially what advertisers and many other businesses do when they gather data on the effectiveness of different product promotion techniques. When an advertising agency generates two versions of the same ad and compares the sales they generate, it is performing an experiment on its customers. Some ads are designed to create an aversion against a competitor's product, in the same spirit as the psychologists who taught the little girl to fear white rats. Furthermore, the customers do not know they are involved in the experiment, and quite probably do not know that such experiments are regularly done. Businesses not only attempt to discover our preferences and dislikes, but they attempt to alter our behavior in ways that benefit them --- actually teaching us to enjoy their products and dislike others. The very basis of capitalism, by which some products survive and others fail, is itself an ongoing set of experiments in human behavior.

Table of contents

Problems