September 6, 2012

The genomic 99%

Today was the release of phase 2 of the ENCODE project, the effort to catalogue all the stuff in the human genome that isn’t genes.  This is a big deal: nearly all our DNA isn’t genes, and ENCODE is a big step towards figuring out what, if anything, it does. (HeraldStuff have the Associated Press story, New York Times has more, Nature has some good news and comment articles).

Our chromosomes spend nearly all their time curled into little tangles, and some of the ENCODE experiments looked at which bits of the DNA are actually accessible on the outsides of these tangles. Other experiments measured where ‘transcription factors’, which turn genes on and off, attach to DNA. Others looked at which bits of DNA get transcribed into RNA by cells. For complete information, these experiments need to be done for the whole genome, and because the behaviour of DNA is different in every cell type, for many types of cells.  That’s only partially been done, and the project is going to contact indefinitely (or at least as long as they can get money — so far they have spent the equivalent of three full years of the NZ Health Research Council budget, or about 2% of the cost of the Large Hadron Collider).

The headline finding in the news stories is that about three-quarters of the genome can sometimes get copied from the DNA ‘reference’ version to temporary RNA.  We used to think that essentially all RNA copies were from genes, and were made for the purpose of translating the RNA into protein.   Over the years, it has become clear that there’s a lot more varied RNA around than can be explained by making proteins, but ENCODE’s results are much more extreme than expected (by me, at least).  We don’t know what most of the non-gene RNA does, and it’s possible that some of it doesn’t do anything, but some of it must do interesting things that we have no clue about.

ENCODE itself was a great opportunity primarily for US researchers, but the ENCODE results are an opportunity for the whole world, and New Zealand scientists will be looking for ways to take advantage of all this new data.

avatar

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »