|Big Science: The Human Genome Project and the Public Funding of Science|
Government support of scientific research has strong influences on the direction of that research; this policy for science significantly affects what research is done and what knowledge might be gained. This chapter explores the relationship between the government funding of basic research in biology and the researchers themselves.
Who should decide what research will be supported? What criteria should be used to determine what research will be funded? Can scientists influence the decision making process? The Human Genome Project (HGP) set out to map the entire genetic complement of human beings—a total of three billion base pairs of DNA.
READ ALSO: An Uneasy Balance: Science Advising and the Politicization of Science
Accomplishing this task meant a major commitment of funds by the federal government, beyond what was already being granted to support research.
How did scientists persuade the government that the project was worthwhile?
Federal Funding of Research and the HGP
The history of the HGP stands in contrast to the typical investigator-initiated funding of research in the biological sciences (see “Funding Biomedical Research” section below). We are familiar with big science projects in the physical sciences, such the Manhattan Project to develop the atomic bomb, the international space station, and a series of multibillion dollar physics projects on nuclear fusion and
particle accelerators (some of which were canceled after expenditures of billions of dollars due to overruns and changing budget priorities). The HGP is the first genuine example of big science in the biological sciences, with an estimated original price tag of $3 billion. It should be noted, however, that $3 billion spread over ten to twenty years represents only a tiny fraction of the federal research and development budget for the biological sciences. The development of the HGP demonstrates both
the power that politically astute scientists have in directing government decisions about funding and the effect that major funding in an area of biology can have on the direction of new research.
The federal government began to fund scientific research at significant levels in the latter half of the twentieth century, after Vannevar Bush, head of the Office of Scientific Research and Development in the Roosevelt administration, developed a plan to continue support of scientific research after the end of World War II .
Bush’s recommendations laid the foundation for the establishment of the NSF, the NIH, and other federal agencies supporting scientific research. Central to Bush’s recommendations were that agencies granting funds would be autonomous entities, run by scientists, not career administrators, and that the direction of research would be determined by the scientists themselves.
The budgets of federal granting agencies have grown enormously in fifty years. In fiscal year 2005, the projected support for research in the life sciences was $25.5 billion at the DHHS, $578 million at the NSF, $289 million at the Department of Energy (DOE), $1.4 billion at the U.S. Department of Agriculture (USDA), and $695 million at the Department of Defense (DOD), most of which was targeted for bioterrorism research . This amount represents only a tiny fraction of the country’s total budget of several trillion dollars. The availability of funds for research in areas determined by federal agencies influences the directions and limits of research in the biological sciences . Funding through the NIH has been particularly generous in the biomedical sciences, with the stated justification that such research will ultimately benefit the health and welfare of Americans. For researchers in non-health-related areas of biology, funding has been more limited, and is awarded primarily through the NSF and the USDA.
Until the 1980s, the federal government funded the majority of research in the biological sciences. The proportion of research dollars coming from industry has increased dramatically, however. Recent estimates are that over 60 percent of biomedical research is supported by industry . The implications of such support will be discussed later.
Beginnings of the HGP
The seeds of the HGP were sown in the mid-1980s by scientists working independently [5, 6]. The first was Robert Sinsheimer, then chancellor of the University of California at Santa Cruz (UCSC). Sinsheimer proposed that UCSC might develop an Institute on the Human Genome and thereby bring the biology program at UCSC into greater prominence. He convened a meeting at UCSC in 1985 of leading mole-cular biologists. The group agreed that a project to develop a large-scale genetic linkage map, a physical map, and the capacity for large-scale DNA sequencing was both appropriate and feasible. Wholesale sequencing of the entire human genome was deemed not technically possible in the view of the gathered scientists; they proposed that sequencing targeted regions would be of interest. Sinsheimer explored sources of funding for such a project (including going directly to Congress [see “Funding Biomedical Research” section below]) but was not successful. One of the
attendees at the Santa Cruz meeting, Walter Gilbert, a 1980 Nobel Prize winner for his work in molecular biology, became the HGP’s strongest proponent in the years that followed.
The second champion of an HGP was Charles Delisi, who became director of the
Office of Health and Environmental Research (OHER) at the DOE in spring 1985. OHER had funded projects to investigate the effects of radiation on Japanese survivors of the bombing of Hiroshima and Nagasaki. Delisi reasoned that the search for DNA mutations might be extended into a project to map and sequence the human genome. He proposed that the national laboratories that had grown as a result of the Manhattan Project and other weapons programs could be redirected
to study the human genome.
Delisi convened a meeting in Santa Fe in early 1986. The consensus again was that genetic linkage and physical mapping was feasible. Unlike Sinsheimer, Delisi was in a good position to influence political decisions concerning funding. He had easy access to government officials with control of
funding and also managed considerable funds himself within OHER. In spring 1986, he submitted a proposal to the director of the DOE for initial funding for the
project, stressing that the DOE was well situated to provide leadership for a major, multiyear endeavor. His proposal, for $78 million from 1987 to 1991, was passed on to the White House Office of Management and Budget, where it also gained support. With this, Delisi redirected $5.5 million of his 1987 budget toward human genome research.
The third scientist who had an early influence on the HGP was Nobel laureate Renato Dulbecco, then president of the Salk Institute for Biological Research in California. He published a commentary in Science magazine in 1986, suggesting that cancer research would be aided by detailed knowledge of the human genome . This article raised awareness among a broader scientific audience of the possibility of sequencing the human genome. In summer 1986, a conference was held at Cold
Spring Harbor, New York, attended by over three hundred molecular biologists. At an open session, Gilbert reported on past meetings and suggested that the genome Big Science 19 might be sequenced for $3 billion. The idea of mapping the genome was viewed favorably in principle. Strong objections were voiced concerning the potential intellectual value of a complete sequence, however, given that only a small fraction of the genome coded for actual genes. The project represented the worst characteristics of “discovery science,” being repetitive, tedious, and without underlying hypotheses.
Many feared that the high cost of a project would undermine investigator-initiated
research by redirecting limited resources to the sequencing project. Others expressed concerns about the appropriateness of having the project managed by the DOE, since expertise in molecular biology appeared to lie elsewhere.
In 1987, Gilbert announced plans to form a company to carry out the genome
project. He had already helped found the Swiss company, Biogen, in 1978. Biogen was one of the first companies established to pursue commercial goals in biotechnology. Gilbert’s proposed new company, Genome Corporation, would carry out mapping and sequencing activities, and market sequence information, clones, and services. Information would be gathered more efficiently and economically than by individual labs working independently. His proposal to commercialize the genome appalled many molecular biologists. Still, because of the stock market downturn in
October 1987, Gilbert was unable to raise the money needed to establish the company. He continued to champion a government-funded project instead.
To help resolve some of the issues, the NRC was asked by leading molecular biologists to conduct a study assessing the feasibility and value of an HGP. The NRC obtained funding from the James S. McDonnell Foundation to conduct its study.
The report, released in February 1988, recommended a fifteen-year project, funded at $200 million per year, to develop linkage and physical maps of the human genome, develop faster methods for sequencing, and ultimately sequence the entire human genome once the technology was available to allow it . In order to be able to place genetic information in context, the mapping and sequencing of the genomes of other species was also necessary. The report argued that the potential value of this knowledge merited a major commitment by the federal government.
Funding for the project should come from new sources, so as not to negatively impact other investigator-initiated research. Funds should be awarded to individuals, collaborative groups, and academic centers using peer-review criteria. The report recommended against a small number of centralized sequencing facilities, in contrast to the plan envisioned by the DOE, in order to broadly draw on existing expertise and develop a stronger scientific workforce. The report also stressed the need to develop means to store and disseminate the large amount of data that would be
generated by the project .
Finally, the report recommended that oversight be provided by a single agency with a scientific advisory panel. With the endorsement of the NRC, Congress began to explore ways to support the HGP. Leading scientists gave testimony to congressional committees, offering broad visions of future applications and improvements in human health. Two competing agencies sought to gain oversight of the project, the DOE and the NIH. Eventually, an agreement was reached to give the NIH lead responsibility, but to allow substantial funding of research through the DOE. The NIH Office of Human Genome Research was created in October 1988, with Nobel laureate James Watson as director. In 1989, it became the National Center for Human Genome Research (NCHGR) with a budget of $59.3 million. The HGP formally began in October 1990, although elements of the project had begun earlier.
The Biology behind the HGP
The HGP grew out of expanding knowledge about the nature of DNA, and the development of tools to manipulate it . Two major areas of biological research— genetics and molecular biology—provided the basic information needed to make the HGP a reality.
The discussion below describes the strategies used to apply basic information to the goals of the HGP. These goals include developing linkage and physical maps of all chromosomes, locating genes within the genome, developing better technology for genetic analysis, and ultimately determining the sequence of all three billion base pairs of DNA.
Linkage Mapping, Physical Mapping, and Gene Discovery
Recognizing an inheritance pattern is the first step in determining the association of a gene with a disease or physical trait. The gene must be located within the genome (the total complement of DNA in the organism) and its function determined before there is much possibility of developing gene-based therapies for the disease. Given the immense size of the human genome, how might this be accomplished? Geneticists, beginning with Gregor Johann Mendel (1822–1884), recognized that certain traits tended to be inherited together long before scientists in the mid-twentieth century determined that genetic information was carried in the form of DNA. This pattern of coinheritance of traits is termed linkage. For example, white domestic cats with blue eyes are often deaf. Linkage suggests that the genes for the two traits Big Science 21 are located fairly close to each other on a piece of DNA. Geneticists were eager to discover ways to determine whether a given individual (or fetus) might be carrying an allele (a version of a gene that recognizably affects its function) that causes a potentially devastating genetic disease.
Given that variations in DNA sequence occur, was it possible to use these as a means to predict disease? If variations in DNA sequence could be found that were linked to a given disease, even if the variations themselves were not part of the gene itself, they might be used as a diagnostic
tool. Human gene mapping did not begin until the 1960s, when mouse-human cell fusion, or somatic cell hybridization, was used to associate certain gene products with identified chromosomes. The development of fluorescent dyes that labeled banding patterns in human chromosomes allowed further genetic mapping. The banding patterns in human chromosomes were so distinctive that deletions, trans-locations, inversions, or other changes could be recognized easily. In 1980, the development of in situ hybridization allowed for more detailed localization. A piece of DNA may be synthesized with a radioactive label, and this serves as a “probe” for the gene in the chromosome. The DNA within the chromosome is treated to separate the strands and the probe is allowed to bind to its complementary sequence.
The radioactivity is detected using X-ray film. In a successful experiment, a spot of radioactivity is found on a particular site on a particular chromosome, identifying that region as the gene location.
The resolution of mapping using in situ hybridization may narrow the location of a gene to a region of several million base pairs. To increase the resolution of a map, other techniques are used. This approach, to search for a gene within a fairly large region of DNA, is called “positional cloning” . This approach takes advantage of variations in DNA sequences to associate the presence of a given allele of the gene to an identified marker. A marker is simply a sequence of DNA whose location is known. Its function (if any) is not known; its usefulness lies in its close linkage to the unknown gene of interest. A region of DNA is treated with a restriction enzyme that cuts the DNA into several fragments at a specific sequence of bases. If there are allelic differences in the gene sequence, a site for the restriction enzyme may be gained or lost, leading to production of different-size fragments, known as restriction fragment length polymorphisms (RFLP). The next step is to determine if
a particular pattern of fragments is reliably associated with the disease.
If so, the marker might be used to diagnosis the presence of a defective allele. The marker may also be used to narrow the region of the chromosome that contains the gene of interest. RFLP mapping helped to locate the genes for Huntington’s disease and Duchenne muscular dystrophy in 1983. As the number of markers increased, so did the pace of new gene discovery.
RFLP technology was expanded in the HGP to develop an array of markers called sequence tagged sites (STSs). These are known sequences of DNA regularly spaced on the chromosomes. They serve two purposes: to facilitate the localization of genes by their proximity (linkage) to given STSs, and to help align pieces of DNA in a physical map.
A second approach to gene localization was to identify DNA sequences from genes, or expressed sequence tags (ESTs). These sequences could be made from messenger RNA (mRNA) using reverse transcriptase, generating complementary DNA sequences (cDNAs). Fragments of these cDNAs could be used to hybridize to genomic DNA, thereby marking a region as containing a gene. It is important to note that ESTs do not determine the function of the gene but only its location. The development of ESTs caused controversy in 1991, when Craig Venter revealed that the NIH was filing patent applications on thousands of ESTs, although nothing was known about the genes of which they were fragments (see below).
Since human somatic cells contain two copies of each chromosome, there is considerable interest in recognizing allelic variations of genes and their potential link to disease or other traits. Differences in gene sequences are frequently the result of changing a single base pair. These single nucleotide polymorphisms (SNPs) tend to be inherited in blocks on a single chromosome. These blocks of SNPs are termed a haplotype and can be recognized using extensions of RFLP analysis. Over twentyseven
million human SNPs had been identified as of November 2005 . Research is necessary to determine which of these chromosomal variants are relevant as markers for disease. An alternative approach is to use highly variable sequences called microsatellites as markers in linkage studies.
Whether either of these approaches may be used to identify disease genes requires the detailed study of defined human populations. One of the HGP’s goals was the construction of a detailed linkage map, consisting of markers separated by ever-decreasing distances, as described above. Another
goal was the development of a physical map. A physical map consists of fragments of DNA that are aligned in their linear sequences. This is made possible by the cloning of many fragments of DNA produced by treatment with different restriction enzymes. The fragments are inserted into plasmids, which are circular strands of DNA from bacteria. Plasmids may be rapidly and cheaply reproduced, or cloned, Big Science 23 providing many copies of the DNA fragment. These plasmids are collected into a library of DNA fragments covering the extent of human DNA. Fragments may then be aligned using STSs or by sequencing the ends of these fragments to determine areas of overlap. The order of the overlapping fragments (or contigs) along a chromosome is determined as a giant jigsaw puzzle, using sequence overlap and the markers developed in linkage mapping. Each new gene may then be assigned first to a given chromosome and then to a smaller region within it. The two techniques
combined narrow down the region in which a given gene might be positioned, reducing the time spent combing the genome for its location.
The final stage of the HGP is to sequence all the aligned fragments of DNA. This yields the sequence of the human genome. The completion of this part of the project is hampered by long stretches of repeated DNA in noncoding regions. Researchers are challenged to identify how many repeats are present. Other stretches of DNA prove difficult to clone for a variety of technical reasons. Therefore, it is not surprising that the “complete” sequence still contains many gaps. It now appears that less than 5 percent of the total sequence codes for genes.
The planners of the human genome project recognized that handling the vast amount of data generated would present a major challenge. Data come in several forms: markers and map information (both linkage and physical), DNA sequences, DNA fragments in a variety of vectors (DNA libraries), and identified genes. How might all this information be managed? The NIH established a gene sequence repository in 1982 called GenBank that would allow retrieval of gene sequences using newly developed computer programs. Investigators were expected to submit gene
sequences at the time of publication of their research; each new gene was given a unique identification number. By the mid-1980s, however, it was clear that the pace of discovery of new sequences was overwhelming GenBank’s ability to manage them and a more extensive effort was required.
The late Senator Claude Pepper recognized the importance of computerized information-processing methods for biomedical research and sponsored legislation that established the National Center for
Biotechnology Information (NCBI) in November 1988 as a division of the National Library of Medicine at the NIH. The NCBI now maintains GenBank (which now contains over forty million sequences ), other databases such as RefSeq (a collection of sequences from several species) plus numerous other resources for molecular biologists. The NIH is also constructing a library of clones of all human genes, called the Mammalian Gene Collection. Databases are also maintained in Europe
and Japan. Researchers and members of the public may access databases at no charge via the Web.
Having repositories of DNA sequences is not useful unless there are means to extract information from them. This process, called gene “mining,” required the development of computer algorithms that permit comparison of sequences and recognition of similarities . David Lipman, Eugene Myers, and colleagues at the NCBI developed the first truly successful algorithm, called BLAST, in 1990 .
BLAST allows researchers to compare newly discovered sequences with those already in the databases. Sequence alignment and similarity comparisons allow researchers to place new genes among functional families, and to recognize homologies between sequences from different species. BLAST analysis proved enormously helpful in gene identification in a broad range of applications beyond the genome project itself.
As map and sequence information is generated, algorithms are needed to order fragments in physical maps. Two programs, named phrap and phred, developed by Phil Green and Brent Ewing at the University of Washington and Washington University at St. Louis, have been heavily used for these purposes. Phred, published in 1998, is particularly useful in automatically interpreting sequence data .
This proved useful for Venter’s “shotgun” approach to sequencing the human genome . Additional programs allow for alignment of the many cloned DNA fragments within chromosomes. A particularly difficult challenge is identifying genes in the finished sequence of human DNA [12, 15]. Surprisingly, researchers cannot agree on how many genes are contained within the genome. Original estimates prior to the HGP were in the range of a hundred thousand genes; current estimates range from twenty-five to forty-five thousand genes, with most researchers predicting numbers at the low end.
This is only about twice that found in C. elegans (roundworm) and Drosophila (fruit fly), two model organisms whose genomes have been sequenced. BLAST analysis helped researchers discover many families of related genes by identifying sequence homologies.
Many genes are not members of gene families. How might they be identified?
ESTs are powerful tools in that they are fragments of expressed genes. Genes may be missed because of their small size, however, or because the genes do not code for protein but rather for RNA. Comparing sequences with another species is a particularly powerful approach, since most of our genes are shared with other organisms.Big Science 25.
Another approach is to search for common regulatory sequences of genes (promoters) that might signal that a gene is nearby. The search for genes is complicated by the presence of pseudogenes, sequences that share similarities with actual genes, but represent non expressed evolutionary dead ends. Gene prediction programs such as Ensemble, Genie, and Genome Scan all have limitations, being either prone to over or underestimate the number of actual genes in model systems.
In conclusion, The increasing emphasis on genetics resulting from the HGP may strengthen perceptions of “genetic determinism,” the view that we are a product of our genes alone. Attitudes about the relationship between genetics and the definition of an individual have far-reaching impacts on many areas of society.
1. Barke, R. Science, technology, and public policy. Washington, DC: CQ Press, 1986.
2. National Science Foundation, Division of Science Resources Statistics. Federal obligations
for research, by field of science and engineering and agency: FY 2005 projected. November
32 Chapter 2
2005. Available at <http://www.nsf.gov/statistics/infbrief/nsf06300> (accessed December 18,
3. Moore, D. T. Establishing federal priorities in science and technology. In AAAS science
and technology policy yearbook 2002, ed. A. H. Teich, S. D. Nelson, and S. J. Lita, 273–284.
Washington, DC: American Association for the Advancement of Science, 2002.
4. Bekelman, J. E., Y. Li, and C. P. Gross. Scope and impact of financial conflicts of interest
in biomedical research. JAMA 289, no. 4 (January 22–29, 2003): 454–465.
5. Cook-Deegan, R. The gene wars: Science, politics, and the human genome. New York:
W. W. Norton and Co., 1994.
6. Kevles, D. J. Out of eugenics: The historical politics of the human genome. In The code
of codes: Scientific and social issues in the human genome project, ed. D. J. Kevles and L.
Hood, 3–36. Cambridge, MA: Harvard University Press, 1992.
7. Dulbecco, R. A turning point in cancer research: Sequencing the human genome. Science
231 (1986): 1055–1056.
8. National Research Council, Committee on Mapping and Sequencing the Human Genome.
Mapping and sequencing the human genome. Washington, DC: National Academy Press,
9. Judson, H. F. A history of the science and technology behind gene mapping and sequencing.
In The code of codes: Scientific and social issues in the human genome project, ed. D. J.
Kevles and L. Hood, 37–80. Cambridge, MA: Harvard University Press, 1992.
10. Weaver, R. F. Molecular biology. 2nd ed. Boston: McGraw-Hill Publishers, 2002.
11. National Center for Biotechnology Information. December 1, 2005. Available at
<http://www.ncbi.nlm.nih.gov/> (accessed December 18, 2005).
12. Birney, E., A. Bateman, M. E. Clamp, and T. J. Hubbard. Mining the draft human
genome. Nature 409, no. 6822 (2001): 827–828.
13. Altschul, S. F., W. Gish, W. Miller, E. Meyers, and D. Lipman. Basic local alignment
search tool. Journal of Molecular Biology 215, no. 3 (1990): 403–410.
14. Ewing, B., L. Hillier, M. C. Wendl, and P. Green. Base-calling of automated sequence
traces using phred: I. Accuracy assessment. Genome Research 8 (1998): 175–185.
15. Snyder, M., and M. Gerstein. Defining genes in the genomics era. Science 300, no. 5617
(April 11, 2003): 258–260.
16. Human Genome Organization. HUGO. November 26, 2002. Available at <http://
www.gene.ucl.ac.uk/hugo/> (accessed November 3, 2003).
17. Roberts, L. Controversial from the start. Science 291, no. 5507 (February 16, 2001):
18. Roberts, L. A history of the human genome project. Science 291, no. 5507 (February
16, 2001): 1195–1200.
19. Sulston, J., and G. Ferry. The common thread: A story of science, politics, ethics, and
the human genome. Washington, DC: Joseph Henry Press, 2002.aaa
20. International Human Genome Sequencing Consortium. Initial sequencing and analysis
of the human genome. Nature 409, no. 6822 (February 15, 2001): 860–921.