When scientists first decided to sequence the human genome, it seemed an impossibly large and complicated challenge. A decade since achieving this aim, scientists are faced with a similarly overwhelming challenge: understanding the folding of a human’s entire genetic data into a tiny cell nucleus.
The human genome consists of deoxyribonucleic acid (DNA), the hereditary material for all living organisms. Each person’s genome resides in a cell’s nucleus and is a sequence of around three billion amino acid letters or nucleotides (comprising guanine, adenine, thymine, and cytosine) that encodes around 57,281 genes which dictate the instructions for creating and maintaining a human being.
Stretched out, the DNA molecule is about three metres long and must be carefully folded and compressed into a complex three-dimensional structure, which doesn’t tie itself in knots, and fits within the nucleus of the cell.
This is quite an incredible feat, as a mammalian cell nucleus is only around 6 micrometres in diameter, or about one tenth the width of a human hair.
To express our genetic instructions, a gene sequence is copied or “transcribed” into a ribonucleic acid (RNA) molecule, which travels out of the nucleus into the fluid within the cell (cytoplasm), and is “translated” into a final protein product.
A basic schematic of gene transcription and translation. Wikimedia Commons
Scientists are beginning to appreciate how the genome’s three-dimensional structure can help decide which genes are expressed, and which are not. The nucleus is partitioned into regions that are busy and active and where many genes are transcribed, and other compressed regions where genes are silent.
By presenting a gene into an active region, the genome structure can influence which genes are transcribed.
Human genes are made of small parts, called exons, which are separated by long non-coding DNA sequences, called introns. When a gene is transcribed, the intervening introns are cut or “spliced” out, and the exons are strung back together to form the sequence that is then translated into protein.
By including or removing different exons, the same gene can be spliced together in different combinations that are then translated into different protein products.
High-res, 3D Genome
We analysed vast amounts of data produced by the ENCODE project, a large international collaboration to identify all functional elements of the human genome. From this analysis, we could infer the three-dimensional genome structure with very high resolution, revealing even little folds and kinks in the sequence.
Surprisingly, we found that within a gene, exon sequences are often folded together (along with the start of the gene where the transcription machinery assembles), while the intervening introns being looped out. This folding may help decide which exons are strung together into the final sequence that is translated to protein.
Since the publication of the human genome, scientists have represented its sequence as a three-billion-long line of letters. However, this study adds to a growing appreciation that it is not just the genome sequence that is important, but also the way this sequence is folded.
It would seem the next impossibly big challenge for scientists is to determine and represent the genome sequence as the massive, dynamic and complicated structure as how it folds within the nucleus.
Author: Tim Mercer, Postdoctoral Research Fellow in Genetics, The University of Queensland. Top Image: Rowena Dugdale, Wellcome Images