Only a small fraction of our DNA contains genes that encode the proteins that go on to build who we are. So why do we have the rest of our genome?
Over many decades, the moniker “junk” has been broadly used to refer to non-coding sequences in our DNA that appear to lack any function. It was first used in the 1960s to suggest that the majority of our DNA may be expendable. The term “junk DNA” has become very popular, although it has deterred some from studying it. Who would seriously apply for funding to investigate junk?
In 2001, the first sequenced human genome surprised us all by identifying only about 20,000 protein-coding genes. This is much fewer than the estimated number of proteins in a cell, which raises questions about how so few genes can code for hundreds of thousands of different proteins in a cell and to what extent junk DNA contributes to their regulation, such as switching them on and off.
In the last decade, new methods to identify the DNA that is transcribed into RNA (a chemical cousin of DNA) have suggested that about 80% of DNA may serve some purpose. Many thousands of new hypothetical genes that encode only RNA, but not proteins, have been discovered. Some of these strands of RNA are indeed involved in the regulation of genes, such as deciding when to switch them on and when to switch them off.
Although we are now certain that many non-coding DNA sequences are pivotal in protecting and stabilising the genome, regulating genes, differentiating cells and forming tissue, organ development from birth to death, differences between people, their variable response to drugs and other environmental cues, and predisposition to a growing number of human diseases, we do not know how much junk is in our DNA. But can we find out?
To appreciate the origin and extent of junk DNA in our cells, we need to understand how it evolved. One of the most critical events in evolution – the duplication of genes, or their coding parts (called exons) – give the cells and organisms a chance to test new function without endangering their viability or fitness.
As duplicated genes, exons or non-coding DNA diverge through errors in replication or DNA repair over many years, the functions of either new or ancestral copies may change. The cell may select sequences underlying new, similar or even opposite functions, leaving either copy in the genomic scrapyard. This does not mean, however, that the discarded DNA segments are no longer useful to the cell.
Genomes are extremely dynamic entities: new functional elements continuously appear and old ones may become extinct. This can be illustrated by repetitive elements named “Alus” that are found in primate genomes, that have accumulated a total of over one million copies and occupy about 11% of human DNA. Alus are often transcribed as RNA and are an important source of new coding parts, gene regulatory elements and protein diversity, especially in highly organised tissues such as the brain.
If junk DNA could alter the function of a cell at any time and we can’t define it, how can we safely edit our genomes? Genome editing technologies such as CRISPR are powerful tools to manipulate both coding and non-coding DNA and study their function.
But we cannot exclude that unintended changes to what simply looks like junk DNA would not harm how genes are expressed. They may have inadvertent consequences for the cell or the organism that may become apparent only later in life, such as infertility, mental illness or cancer, and propagate into future generations.
For example, manipulating gene-intervening sequences can modify interactions between proteins. Although current genome editing procedures, including gene editing in human embryos, aim at surgically accurate changes of DNA they can’t yet fully exclude undesirable alterations in our non-coding or junk DNA reservoir.
It’s deeply rooted in our nature to fear or dismiss what we don’t fully understand. Although we can’t predict which DNA segment may become functional today and which tomorrow, we are now equipped with formidable tools including genome editing to examine the function of non-coding DNA in greater detail in the coming years.
We should support activities that improve this understanding and avoid those that may damage what we may never be able to repair or create again, namely, the irreplaceable heritage of well over one billion years of evolution, including the human genome.
Authors: Igor Vorechovsky, Principal Research Fellow, University of Southampton and Andrew Richard Collins, Chair professor, University of Southampton.