Distinguishing hypotheses for the appearance of targeted hypomutation

Synopsis

Suppose that we observe a genomic pattern in which the rate of mutation is inversely correlated with functional density, so that mutation rates are lower in genes than in non-genes, as argued by Monroe, et al (2022). One widely discussed interpretation of this pattern of genic hypomutation is that it represents an adaptive adjustment of the mutation rate, reflecting the higher average fitness cost of mutations in genes than non-genes. A related but distinctly different interpretation— one that has not received much attention— is that the target of adaptive amelioration is unrepaired damage, and that the pattern of genic hypomutation is a side-effect of genic hyper-repair. Distinguishing the hypotheses of targeted hypomutation (THM) and targeted damage reduction (TDR) is of interest.

Proposed pattern

To clarify the issue, let us distinguish the alleged pattern from the proposed evolutionary explanation. Analyses of mutation data might reveal a mutation-function anti-correlation, even if we do not understand how this pattern emerged. Monroe, et al (2022) reported that, for Arabidopsis thaliana, mutation rate is “reduced by half inside gene bodies and by two-thirds in essential genes.” An earlier report by Martincorena, et al. 2012. found a similar pattern in E. coli and argued for “an evolutionary risk-management strategy”. This empirical claim is currently disputed by some authors, e.g.,  Liu and Zhang (2022) or Charlesworth and Jensen (2023).

Theory

1. Mutation load

Classic population-genetics theory tells us that the effect of deleterious mutation can be represented as a load or fitness cost quantified via the total deleterious mutation rate. Under some simplifying conditions, the precise fitness distribution (DFE)— which tells us how bad the mutations are— does not matter much, and we only need to be concerned with the total deleterious rate Ud. Intuitively, the reason for this is that, whereas eliminating a new lethal mutation with s = -1 takes place in 1 generation via 1 death, eliminating a new modestly deleterious mutation with s = -0.001 takes place via partial reductions in fitness over many descendant-generations— and the two effects roughly cancel out, so that the expected cost of 1 deleterious mutation is 1 selective death, regardless of the degree of severity. On this basis, we can represent the cost of deleterious mutation with a total genomic mutation rate Ud.

2. Selection of mutation rate modifiers

Now, suppose there is a genetic modifier allele that reduces deleterious mutation by an amount ΔUd. It doesn’t matter how the effect is achieved— it might be achieved by improving the efficiency of repair, by neutralizing chemical agents of damage, or by a behavioral change that reduces UV exposure. For any kind of modifier reducing the deleterious mutation rate, it follows from the previous paragraph that the fitness advantage is s ~ ΔUd.

3. The limits of selection and the Drift Barrier Hypothesis

From this result, another result follows, which is the limited power of adaptive amelioration. Drift imposes a practical lower limit on the sizes of effects of alleles that reduce the mutation rate. Selection cannot overcome the effects of drift unless s >> 1/Ne, and this means that, to be selectable in practice, modifiers must have effects of ΔUd >> 1/Ne.

Meanwhile, our biological intuition tells us that, if the mutation rate is already low, random mutations are more likely to increase the mutation rate than to decrease it. As a result, we expect a constant pressure of neutral or nearly-neutral fixation of small-effect modifiers, and the direction of this pressure is to increase the mutation rate. The Drift Barrier Hypothesis is the idea that genomic mutation rates in nature tend to stay near the neutral limit of adaptive amelioration, reflecting a balance between the upward pressure of neutral or nearly neutral mutation-fixation, and the downward pressure due to adaptive amelioration. For instance, repair DNA polymerases are only used for a tiny fraction of DNA synthesis, compared to the major replicative polymerases. Therefore, under the DBH, they will tend to have much higher per-nucleotide error rates than repair polymerases (see Lynch, et al 2023).

4. Theory of targeted hypomutation

The limited power of adaptive amelioration has been used to argue that we do not expect to see the emergence of a negative correlation between local mutation rate and functional density. The basis for this theoretical claim is that the emergence of this pattern would require targeted amelioration affecting individual sites or genes that is alleged to be unlikely due to the requirement for ΔUd >> 1/Ne.

However, Martincorena, et al (2013) identified two exceptions to this reasoning. First, they found that, for microbes with large population sizes and relatively small genomes, conditions may be feasible for selection of a modifier allele that fractionally reduces the mutation rate for a small part of the genome (e.g., a single gene, or even a site). This is one way that targeted hypomutation (THM) could arise gradually by adaptive amelioration of the mutation rate.

The second exception is due the possibility that modifier alleles could take advantage of biochemical signals that (1) are correlated with functionality and (2) affect many dispersed sites simultaneously. In particular, modifiers that target repair enzymes systematically toward functional regions, e.g., by leveraging chromatin marks associated with gene expression, may lead to THM. Repair proteins could be recruited preferentially to DNA sites with the kinds of signals associated with genes, and this could lead to an inverse mutation-function correlation.

This idea received renewed attention when Monroe, et al (2022) argued for a negative correlation between mutation rate and functional density in the Arabidopsis thaliana genome. In subsequent work, Quiroz, et al (2024) showed that, for the Arabidopsis genome, repair proteins are recruited preferentially to genic regions via H3K4me1 histones. This implicates a proximate cause for the pattern of genic hypomutation, but does nothing further to establish an evolutionary explanation.

5. Towards a theory of targeted hyper-repair

In particular, an alternative hypothesis to account for genic hypomutation is targeted damage reduction (TDR). In the THM hypothesis, selection shapes deleterious mutation patterns via mutation load. By contrast, in the TDR hypothesis, the fitness cost is not the deleterious phenotypic effect of a mutation, but the deleterious consequences of damage— consisting of pre-mutational lesions— that is not immediately repaired. When DNA has a broken backbone, or a nucleobase modified with a bulky adduct, this can represent a transcription-blocking DNA lesion or TBL. A modifier allele that directs repair enzymes toward expressed genes would allow higher levels of expression, reduce wastage due to aborted transcripts, and mitigate unwanted activities of aborted transcripts (i.e., dominant negative effects).

For instance, in the case of Arabidopsis, we could argue that the reduced mutation rate in genes vs. non-genes is a side-effect of increased repair that was favored, not because it reduced mutations, but because it increased the fraction of time that the genes are in a useable non-damaged state, ready to be expressed (whereas non-genes can be left in a damaged state with no consequences, except during replication).

Currently there is no formal theory for this. To work out such a theory, we must specify some cost function that relates fitness to a quantifiable measure of damage or non-damage, such as the time-averaged occupancy of a transcribable state (vs. the time-averaged presence of 1 or more transcription-blocking lesions). To be useful in an evolutionary theory, the cost function must be able to predict the selection coefficient for a modifier allele that reduces damage.

Distinguishing the hypotheses

One appeal of the damage hypothesis is that damage happens a lot more reliably than mutation (e.g., see Lans, et al), making it a more likely target of selection: to the extent that damage happens reliably to a gene every generation, selection to reduce damage is more like individual selection on immediate harm, and less like lineage selection contingent on chancy mutations. So, if we were to work out a mathematical theory, we might find that the benefits of repair are greater when they are measured in terms of damage remediation than in terms of mutation remediation.

In the absence of a quantitative theory, qualitative predictions based on inequalities are often useful in distinguishing evolutionary hypotheses. The implications of TDR and THM hypotheses will probably show a lot of overlap due to the inter-relations of damage and mutation. Sites of damage are always potentially sites of mutation (though sites of mutation are not always sites of damage). Reducing DNA damage typically will have the side-effect effect of reducing mutation, and a reduction in mutation will often entail a reduction in DNA damage.

What are some differences between the two hypotheses that could be leveraged to generate contrasting predictions?

One differences is strandedness. If damage (rather than mutation) is the issue, and the disadvantage of damage is mainly a matter of transcription being impeded or rendered faulty, then it is strand-specific. Let’s call the two strands “plus” and “minus”. Suppose that, for a given gene, the transcribed strand is the plus strand. A damaged or missing base on the plus strand would impede transcription, and a nick on the plus strand might result in an aborted product. On the minus strand, the same types of damage will not have such consequences. Thus, the strand matters under the damage hypothesis. But if selection is mainly acting via the downstream (cross-generational) effects of deleterious mutations on the phenotype, then the strand no longer matters.

What this means is that, if there exist repair mechanisms that act in a strand-specific manner, under TDR we expect them to show a bias toward repair of the transcribed strand, whereas under THM a strand-specific mechanism might occur by chance, but there is no reason to expect a strand bias relative to transcription.

Another way to find differences in predictions is to think about different pathways of DNA metabolism that differ in the relative chance of sustained damage vs. mutation. For instance, imagine some hypothetical kind of damage that is easy to repair but for which repair often results in mutation, and imagine another kind of damage that takes time and energy to repair but, once repaired, nearly always returns the DNA to the wild-type state rather than inducing a mutation. The first kind of pathway is less costly under TDR and more costly under THM, and vice versa for the second kind of pathway.