Sequencing the human genome has been one of the crowning achievements of modern biology, not only because of its sheer scale but also because of its potential impact on our understanding of human evolution, physiology, and disease. And yet, unraveling the sequence of bases was the “easy” part. Now comes the hard part of figuring out the meaning of this sequence of 3 billion A’s, G’s, C’s, and T’s.
The prospect of analysing such a vast torrent of data has led to the identification of a new discipline, called bioinformatics, which merges computer science and biology in an attempt to make sense of it all. For example, computer programmes that analyse DNA for stretches that could code for amino acid sequences are used to estimate the number of protein-coding genes.
Such analyses suggest the presence of about 30,000 protein-coding genes in the human genome, half of which were not known to exist. The fascinating thing about this estimate is that it means humans may have only about twice the number of genes as do worms or files! Computer analysis has also revealed that only about one to two per cent of the human genome actually codes for proteins.
While the remaining DNA contains some important regulatory elements as well as some genes that code for RNA products instead of proteins, most of it appears to consist of “junk” DNA with no apparent function.
Because the function of most genes is to produce proteins which are responsible for most cellular functions, scientists are now looking beyond it to study proteome — the structure and properties of every protein produced by a genome. The complexity of an organism’s proteome is considerably greater than that of its genome. For example, the roughly 30,000 genes found in human cells are thought to produce somewhere between 200,000 and a million or more proteins. This is why cells can produce so many proteins from a smaller number of genes.
In essence, it reflects the fact that an individual gene can be “read” in multiple ways to produce multiple versions of its protein product. The resulting proteins are subject to biochemical modifications that can significantly alter their structural and functional properties.
Identifying the vast number of proteins produced by a genome has been facilitated by mass spectrometry, a high speed, extremely sensitive technique that utilises magnetic and electric fields to separate proteins or protein fragments based on differences in mass and charge. One application of mass spectrometry has been to identify the peptides derived from proteins separated by gel electrophoresis and then digested with specific proteases, such as trypsin.
By comparing the resulting data to the predicted masses of peptides that would be produced by DNA sequences present in genomic databases, the proteins produced by newly discovered genes can be identified. Other techniques make it feasible to study the interactions and functional properties of the vast number of proteins found in a proteome.
For example, it is possible to immobilise thousands of different proteins (or other molecules that bind to specific proteins) as tiny spots on a piece of glass smaller than a microscope slide. The resulting protein microarrays can then be used to study a variety of protein properties, such as the ability of each individual spot to bind to other molecules added to the surrounding solution.
Another important feature of the human genome is the way in which its base sequence differs from person to person. The published human genome sequence is actually a mosaic obtained from the DNA isolated from 10 different individuals.
In practice, about 99.7 per cent of the bases in genome will match perfectly with this published sequence, or with the DNA base sequence of your next-door neighbour. But the remaining 0.3 per cent of the bases can vary from person to person, creating features that make individuals unique.
These variations are called single nucleotide polymorphisms, or SNPs. 0.3 per cent multiplied by the 3.2 billion bases in the human genome yields roughly 10 million SNPs. Scientists have already created a database containing most of the common SNPs, which are thought to be important because these tiny genetic variations may influence how likely you are to become afflicted with a particular disease or how well you might respond to a particular treatment.
The impact of this growing body of genetic data is already becoming apparent as discoveries regarding the genetic basis of many human diseases – from breast cancer and colon cancer to diabetes and Alzheimer’s disease – are being reported at rapidly increasing pace.
Such discoveries promise to revolutionise the future practice of medicine, because the ability to identify disease genes and investigate their function makes it possible to devise medical interventions for alleviating and even preventing disease.
But the ability to identify potentially harmful genes also raises ethical concerns, because all of us are likely to carry genes that place us at risk. Such information could be misused. To alter people’s genes, not only to correct diseases in malfunctioning body tissues but also to change genes in sperm and eggs, and hence to alter the genetic makeup of future generations. What use to make of these abilities and how they should be regulated are clearly questions that concern not only the scientific community but society also.
The writer is associate professor and head, department of botany, Ananda Mohan College.