In 1989, molecular biologists in the United States along with international collaborators, began research to document every single chemical component of the human genetic code. The project was expected to take 15 years and cost three billion dollars. The early strategy of the HGP scientists was to identify interesting stretches of chromosome, usually a gene implicated in a disease. Small pieces of this stretch of DNA were farmed out to various laboratories for sequencing, which was a very tedious process at that time. Later, computers were used to establish how the fragments should be reconnected to obtain the original order again. At the half-way point in the program, only 3% of the objective had been achieved. [At least they knew the approximate function of each gene studied.]
But an outsider soon shook up that sedate program. Dr. Craig Venter was a scientist who had been refused grant money to describe the genetic information in the influenza bacterium (Haemophilus infuenzae). The funding agency said his approach would not work. The only catch was that by the time the letter arrived, Dr. Venter had already nearly completed the sequence because his approach did work.
Dr. Venter now turned his attention to the human genome project. He was not interested in working with specific genes. What he advocated was the sequencing of the whole genome all at once. Computers would later figure out the order in which all the small pieces should be connected. Later, computers would be used to figure out the identity and function of the various parts. In May 1998 Dr. Venter and the equipment manufacturer Perkin-Elmer announced the formation of Celera Genomics Corporation. This corporation would supply 300 robot machines capable of sequencing the DNA molecule at a then incredible speed of 1000 nucleotides per second.
Celera’s robots began their work in September 1999. Within seven months the company claimed success. It soon became apparent however that the newcomer owed a considerable debt to information posted on the internet from the government sponsored HGP. So, both groups were accorded equal credit. On June 26, 2000 American President Bill Clinton and British P.M. Tony Blair jointly announced the successful completion of the first rough draft of the human genome. On April 14, 2003 the project was declared complete.
The next major challenge was to read what the more than three billion nucleotides were communicating to the cell. Comparisons would be made with the genetic code of the fruit fly (Drosophila), roundworm (Caenorhabditis), yeast (Saccharomyces), and mustard plant (Arabidopsis). A computer algorithm would look for similarities in patterns of the 4 nucleotides to known genes in these other organisms. The really big surprise was the small number of genes discovered in the human genome. The traditional definition of a gene is that it represents the instructions needed to produce one type of protein molecule. The scientists only identified about 20,000 genes in the whole human genome, much fewer than expected!
The discovery of the low number of genes was completely unexpected. “The challenge we face is nothing less than understanding how this comparatively small set of genes creates the diversity of phenomena and characteristics that we see in human life.” [Peer Bork and Richard Copley 2001. Nature 409 Feb. 15 p. 820] It was to answer that challenging conundrum that a consortium was established in 2003 to discover why so much human genetic information appeared to have no function. Thus was born the ENCODE project.
The bottom line is that scientists immediately realized that the human genome is far more complex than they had imagined. Obviously, the whole system is coordinated and managed in ways which scientists never imagined possible.