Extending the genetic system

Code Identified
von Prof. Dr Thomas Carell

The genetic code encodes all of the information that each cell requires to function and interact correctly with its environment. The code is constructed from four separate molecules, known as the “canonical” Watson-Crick bases, namely: adenine, cytosine, guanine and thymine. The genetic code arises from the sequence of these four bases – given as A, C, G and T – in the DNA double helix.

Since the discovery of this DNA double helix structure by Watson and Crick in 1953, we know how these four bases are arranged in the DNA molecule. From this point in time, a large part of the international scientific community has been addressing the question of how reader proteins recognise this four-base sequence and translate it into proteins, i.e. the cell's functional units. For a long time, it has been known that a fifth base – 5-methylcytosine – exists alongside the four standard bases (Fig.1). This base is used to switch genes on and off. While all cells possess the same DNA molecule and thus the same sequence data as their “hardware”, the various types of cell differ significantly in terms of their functionality (we need only compare a neuron with a skin cell). As a result, there must be a higher-order informational domain external to this sequence data. At this level, decisions are made about which genes to activate for a specific cell type – and which genes this type switches off. This is the domain studied by the field of research known as epigenetics. Until 2009, the construction of our genetic system from this 4+1 base model was assumed to have been proven.

5-hydroxymethylcytosine

In 2009, the discovery was made that a further base could play a major role in our genome and is therefore now referred to as the genome's sixth base. The base in question is 5-hydroxymethylcytosine (5-hmC, Fig.1) [1]. Previously, this base was classified as “oxidative damage”, i.e. it was assumed that 5-hydroxymethylcytosine is a degradation product that, on its occurrence in the genome, is rapidly removed by repair processes. There are numerous such DNA degradation products. They occur as a result of reactive oxygen species, released within our mitochondria during the process of cell respiration. These oxygen species attack the DNA bases, leading to their modification into oxidative damage. These “damage products” – a term also including 5-hydroxymethylcytosine to date – are efficiently tracked down and cleared up by repair enzymes in the genome.

In 2009, it was discovered that hydroxymethylcytosine is not merely the result of oxidative stress but is actively produced within our genome. Specialised enzymes are responsible for its production. Known as “TET” enzymes, three (Tet1 to Tet3) have been discovered to date [2]. These Tet enzymes are oxidation enzymes, which trigger oxidation processes with the aid of the cofactor ketoglutarate and with an iron atom bound at the active site. In an initial step, these enzymes oxidise the genome's fifth base 5-methylcytosine to 5-hydroxymethylcytosine. This discovery has fundamentally changed the work conducted in the field of epigenetics, i.e. the area of scientific research concerned with gene activation and deactivation. Recalling that regulated gene activation and deactivation forms the basis for cell differentiation, we can grasp how the discovery of hydroxymethylcytosine is also hugely influential on contemporary stem cell research. Pluripotent stem cells, created following the fertilisation of the egg cell by the sperm cell, are the basis for the development of all forms of tissue. During this development process, the DNA sequences responsible for the creation of specialised tissue must be selectively activated and others selectively deactivated. Gene activation and deactivation, plus the processes underlying these switching activities, form the basis for the development of a complex organism from a fertilised egg cell, known as a “zygote”. The targeted oxidation of methylcytosine into hydroxymethylcytosine – i.e. from the genome's fifth base to its sixth – is now suspected to play a major role in these activation and deactivation processes. Indeed, recent research data now underpins this theory, showing that embryonic stem cells in particular exhibit surprisingly high concentrations of hydroxymethylcytosine [2].

Fig. 1 Depiction of the new DNA bases hmC, fC and caC, including the repair/removal processes as currently postulated.

Complex oxidation processes

Other research hints at significantly greater complexity in the oxidation processes from methylcytosine to hydroxymethylcytosine. In 2011, for example, two further cytosine-derived bases were found, which can today be classified as the genome's seventh and eighth bases [3–5]. The bases in question, formylcytosine and carboxycytosine (fC and caC, Fig.1), are higher-order oxidation products of hydroxymethylcytosine. Research has shown that the TET enzymes not only oxidise methylcytosine to hydroxymethylcytosine, but also perform further oxidation steps – ultimately yielding both formylcytosine and carboxycytosine (Fig.1). To date, nothing is known about the agent controlling these oxidation processes, nor the significance of their successive nature. We are also unaware of the extent to which these new bases – hydroxymethyl-, formyl- and carboxycytosine – possess their own biochemical functions, i.e. how they recruit specific proteins that then participate in the processes of gene activation and deactivation. All over the world, mass spectrometric methods are being applied to search for proteins exhibiting high-affinity binding to these new bases, so as to gain insights into the biochemical processes that are regulated by these new DNA bases [6]. One aspect of the overall picture is now becoming clearer: these oxidation processes targeting DNA – and cytosine bases in particular – are crucial for controlled gene (de-)activation. One paper has shown that hydroxymethylcytosine is not recognised by human DNA repair machinery, for example. Accordingly, it stays in the genome, even while repair processes identify and excise the majority of modified bases created in each and every cell on a daily basis. This fact alone – that hydroxymethylcytosine remains in the genome – suggests that we have yet to fully appreciate the importance of the role played by this base. Unlike hydroxymethylcytosine, formylcytosine and carboxyl compounds are readily excised from the genome by repair processes: we may therefore conclude that these bases are created as Nature's way of achieving the targeted removal of mC from the genome. Alongside the repair of hmC, fC and caC, straightforward direct “removals” – i.e. conversion to C – can also be considered here, based on a dehydroxymethylation of hmC, deformylation of fC or decarboxylation of caC. Today, we still have yet to understand whether repair processes or these direct conversions are actually involved in the reactivation of genes – or whether they merely remove the bases that are the result of undesired overoxidation within the genome. Are formyl- and carboxycytosine damage products, formed by excessive and deleterious activity of the TET oxidases? Or is the formation of formyl- and carboxycytosine a deliberate and biochemically necessary step, which triggers processes that we have yet to fully comprehend?

Recent research has also shown that the TET enzymes are also able to oxidise small quantities of the canonical base thymine to hydroxymethyluracil. Hydroxymethyluracil is also a base that is efficiently detected and repaired by the human DNA repair system. In this scenario, too, we also have to ask ourselves whether hydroxymethyluracil is a base possibly created by oxidative stress – or whether the oxidation and subsequent repair actually act as triggers for unknown biochemical processes? All of these observations suggest that gene activation and deactivation is an activity closely interwoven with the various DNA repair processes. DNA sequences are methylated and thus deactivated. Oxidation processes acting on methylcytosine create hydroxymethylcytosine, formylcytosine and carboxycytosine, with hydroxymethylcytosine itself remaining stable in the genome. In the case of formylcytosine and carboxycytosine, on the other hand, DNA repair processes or direct conversions to C are triggered. A similar picture results for thymine, which is oxidised by the same TET enzymes to hydroxymethyluracil, which also triggers repair processes. In all cases, these repair/conversion processes cause the replacement of the highly-oxidised bases hmC, fC, caC and also hmU by the canonical bases cytosine (C) and thymine (T), whereby the process can then reiterate. This generates a cycle of methylation (for C-bases only), oxidation and removal of the oxidised bases, followed by their replacement with “fresh” cytosine or thymine. The resulting picture is one of a dynamic genome, in which the sequence information may be static, but where gene activity is regulated by oxidation and repair. We have yet to unmask the precise biochemical processes that trigger the oxidised bases. Yet while the biochemical regulator of the four bases remains a mystery, one fact is already clear: our genetic system is much more complex than previously assumed.

Bibliography
[1] Kriaucionis, S. & Heintz, N. (2009) Science 324, 929??–930
[2] Tahiliani, M. et al. (2009) Science 324, 930??–935
[3] He, Y. F. et al. (2011) Science 333, 1303??–1307
[4] Ito, S. et al. (2011) Science 333, 1300??–1303
[5] Pfaffeneder, T. et al. (2011) Angew. Chem. Int. Ed. 50, 7008??–7012
[6] Spruijt, Cornelia G. et al. (2013) Cell 152, 1146??–1159

L&M orient 1 / 2014

The articles are publishes in issue L&M orient 1 / 2014.
Free download here: download here

The Author:

Prof. Dr Thomas Carell

Extending the genetic system

Code Identified von Prof. Dr Thomas Carell

L&M orient 1 / 2014

The Author:

Read more articles online

Code Identified
von Prof. Dr Thomas Carell